Spatial Generalization Tests for Machine Learning-based Weather Models as a Requirement for Climate Predictions

Copernicus Publications (2026)

Authors:

Maren Höver, Milan Klöwer, Christian Schroeder de Witt, Hannah M Christensen

Abstract:

Machine learning-based weather prediction is revolutionizing weather forecasting by learning from present-day climate. However, generalization to other climates remains a major challenge. With melting sea ice, land-use change and increasing ocean temperatures, boundary conditions are changing. Therefore, generalization in time will likely only be possible if generalization in space is also given. The physics of the atmosphere is invariant in space, and as such, a model should demonstrate the same to accurately represent the real world.Here, we present three test cases to evaluate whether machine learning-based weather and climate models generalize spatially and apply them to multiple AI weather models. The tests consist of reversing the entirety of the input data and boundary conditions in latitude (Test 1), reversing them in longitude (Test 2), as well as rotating them by 180Ëš in longitude (Test 3), while keeping all aspects of the simulation physically consistent. For a deterministic model that generalizes in space, each of these test cases yields the same predictions as the baseline case, only subject to a rounding error. With these test cases, we investigate whether data-driven models hardcode representations of spatial relationships in the training data into their latent space. We show that currently, both fully data-driven and hybrid general circulation models do not pass these tests, instead performing poorly with unphysical results. This implies that they have likely not learned underlying atmospheric physics principles, but instead local spatial relationships statistically dependent on geographical location. This calls into question the ability of such models to simulate a changing regional climate. As such, we propose that machine learning-based climate models be evaluated using our spatial tests during model development to reduce overfitting on present-day regional climate.

Forced Component Estimation Statistical Method Intercomparison Project (ForceSMIP)

Journal of Climate American Meteorological Society (2026)

Authors:

Robert CJ Wills, Clara Deser, Karen A McKinnon, Adam Phillips, Stephen Po-Chedley, Sebastian Sippel, Anna L Merrifield, Constantin Bône, Céline Bonfils, Gustau Camps-Valls, Stephen Cropper, Charlotte Connolly, Shiheng Duan, Homer Durand, Alexander Feigin, MA Fernandez, Guillaume Gastineau, Andrei Gavrilov, Emily Gordon, Moritz Günther, Maren Höver, Sergey Kravtsov, Yan-Ning Kuo, Justin Lien, Gavin D Madakumbura, Nathan Mankovich, Matthew Newman, Jamin Rader, Jia-Rui Shi, Sang-Ik Shin, Gherardo Varando

Abstract:

Abstract Anthropogenic climate change is unfolding rapidly, yet its regional manifestation can be obscured by internal variability. A primary goal of climate science is to identify the externally forced climate response from amongst the noise of internal variability. Separating the forced response from internal variability can be addressed in climate models by using a large ensemble to average over different possible realizations of internal variability. However, with only one realization of the real world, it is a major challenge to isolate the forced response directly in observations. In the Forced Component Estimation Statistical Method Intercomparison Project (ForceSMIP), contributors used existing and newly developed statistical and machine learning methods to estimate the forced response over 1950–2022 within individual realizations of the climate system. Participants used neural networks, linear inverse models, fingerprinting methods, and low-frequency component analysis, among other approaches. These methods were trained using large ensembles from multiple climate models and then applied to observations. Here we evaluate method performance within large ensembles and investigate the estimates of the forced response in observations. Our results show that many different types of methods are skillful for estimating the forced response in climate models, though the relative skill of individual methods varies depending on the variable and evaluation metric. Methods with comparable skill in models can give a wide range of estimates of the forced response pattern in observations, illustrating the epistemic uncertainty in forced response estimates. ForceSMIP gives new insights into the forced response in observations, its uncertainty, and methods for its estimation.

Reconstruction of last millennium sea surface temperature on 1° grid using a random forest algorithm

Global and Planetary Change 258 (2026) 105279

Authors:

Simon LL Michel, Didier Swingedouw, Juliette Mignot

Abstract:

Climate models and theoretical evidence show that the ocean drives climate from sub-decadal to centennial timescales through a variety of processes and their interactions. The range of direct climate observations, however, is too short to understand the exact role of the ocean in shaping observed and future climate variability on top of anthropogenic climate change. In the present study, we use a large set of paleoclimate records combined with a random forest algorithm to reconstruct a gridded dataset of sea surface temperatures since 850 C.E. to provide a better framework for the study of ocean surface variability. In line with modeling and paleodata studies, our reconstruction suggests that natural climate forcings have importantly influenced the last millennium climate variability. Our reconstruction also suggests that North Atlantic SST multidecadal variability influences Pacific SST on decadal timescales. However, the latter result is shown to be strongly dependent on background climate conditions. This new reconstruction offers a useful resource for testing the capabilities of climate models to reproduce the linkages between Atlantic and Pacific as well as the response to external forcings.

New insights into decadal climate variability in the North Atlantic revealed by data-driven dynamical models

Earth System Dynamics (2025)

Authors:

Andrew J. Nicoll, Hannah M. Christensen, Chris Huntingford, and Doug Smith

Abstract:

The Atlantic Multidecadal Variability (AMV) and the North Atlantic Oscillation (NAO) are the dominant modes of oceanic and atmospheric variability in the North Atlantic, respectively, and are key sources of predictability from seasonal to decadal timescales. However, the physical processes and feedback mechanisms linking the AMV and NAO, and the role of diabatic processes in these feedbacks, remain debated. We present a data-driven dynamical modelling framework which captures coupled decadal variability in AMV, NAO, and North Atlantic precipitation. Applying equation discovery methods to observational data, we identify low-order models consisting of three coupled ordinary differential equations. These models reproduce observed decadal variability and show robust out-of-sample predictive skill on multi-annual to decadal lead times. The resulting model dynamics include a distinct quasi-periodic 20-year oscillation consistent with a damped oceanic mode of variability. Notably, precipitation-related terms feature prominently in the low-order models, suggesting an important role for latent heat release and freshwater fluxes in mediating ocean–atmosphere interactions. We propose new feedback mechanisms between North Atlantic sea surface temperature and the NAO, with precipitation acting as a dynamical bridge. Overall, these results illustrate how equation discovery can provide mechanistic hypotheses and new insight beyond conventional analyses of observations and climate model simulations.

Image calibration between the Extreme Ultraviolet Imagers on Solar Orbiter and the Solar Dynamics Observatory

Astronomy and Astrophysics 703 (2025)

Authors:

C Schirninger, R Jarolim, AM Veronig, A Jungbluth, L Freischem, JE Johnson, V Delouille, L Dolla, A Spalding

Abstract:

To study and monitor the Sun and its atmosphere, various space missions have been launched in the past decades. With rapid improvement in technology and different mission requirements, the data products are subject to constant change. However, for such long-term studies as solar variability or multi-instrument investigations, uniform data series are required. In this study, we built on and expanded the instrument-to-instrument translation (ITI) framework, which provides unpaired image translations. We applied the tool to data from the Extreme Ultraviolet Imager (EUI), specifically the Full Sun Imager (FSI) on Solar Orbiter and the Atmospheric Imaging Assembly (AIA) on the Solar Dynamics Observatory (SDO). This approach allowed us to create a homogeneous dataset that combines the two extreme ultraviolet (EUV) imagers in the 174/171 Å and 304 Å channels. We demonstrate that ITI is able to provide image calibration between Solar Orbiter and SDO EUV imagers, independent of the varying orbital position of Solar Orbiter. The comparison of the intercalibrated light curves derived from 174/171 Å and 304 Å filtergrams from EUI and AIA shows that ITI can provide uniform data series that outperform a standard baseline calibration. We evaluate the perceptual similarity in terms of the Fréchet inception distance, which demonstrates that ITI achieves a significant improvement of perceptual similarity between EUI and AIA. The study provides intercalibrated observations from Solar Orbiter/EUI/FSI with SDO/AIA, enabling a homogeneous dataset suitable for solar cycle studies and multi-viewpoint investigations.