The scientific literature has many studies evaluating numerical weather prediction (NWP) models. However, many of those studies averaged across a myriad of different atmospheric conditions and surface forcings that can obfuscate the atmospheric conditions when NWP models perform well versus when they perform inadequately. To help isolate these different weather conditions, we used observations from the U.S. Climate Reference Network (USCRN) obtained between 1 January and 31 December 2021 to distinguish among different near-surface atmospheric conditions [i.e., different near-surface heating rates ( d T / d t ), incoming shortwave radiation (SW d ) regimes, and 5-cm soil moisture (SM05)] to evaluate the High-Resolution Rapid Refresh (HRRR) Model, which is a 3-km model used for operational weather forecasting in the United States. On days with small (large) d T / d t , we found afternoon T biases of about 2°C (−1°C) and afternoon SW d biases of up to 170 W m−2 (100 W m−2), but negligible impacts on SM05 biases. On days with small (large) SW d , we found daytime temperature biases of about 3°C (−2.5°C) and daytime SW d biases of up to 190 W m−2 (80 W m−2). Whereas different SM05 had little impact on T and SW d biases, dry (wet) conditions had positive (negative) SM05 biases. We argue that the proper evaluation of weather forecasting models requires careful consideration of different near-surface atmospheric conditions and is critical to better identify model deficiencies in order to support improvements to the parameterization schemes used therein. A similar, regime-specific verification approach may also be used to help evaluate other geophysical models. Significance Statement Improving weather forecasting models requires careful evaluations against high-quality observations. We used observations from the U.S. Climate Reference Network (USCRN) and found that the performance of the High-Resolution Rapid Refresh (HRRR) Model varies as a function of differences in near-surface heating and solar radiation. This finding indicates that model evaluations need to be conducted under varying near-surface weather conditions rather than averaging across multiple weather types. This new approach will allow for model developers to better identify model deficiencies and is a useful step to helping improve weather forecasts.
Authors who have authored or contributed to this publication.