Hurricane Florence as Seen from the International Space Station
Major corporations have introduced a new generation of machine-learning weather models.
These models challenge the established physics-based forecasting methods refined over decades.
But how effective are these machine learning models?
Weather is a significant aspect of British life, given the frequent and dramatic shifts in conditions.
Precise weather forecasts are crucial for daily planning, but accurate severe weather predictions are even more important for behavioral adjustments, life-saving measures, and property damage mitigation.
The global economic value of accurate weather forecasts is immeasurable, but undeniably substantial.
According to NOAA (National Oceanic and Atmospheric Administration), in the US alone, focusing on weather disasters exceeding $1bn (£740m) in damage, the 2024 impact reached $182bn, with 568 fatalities.
Since 1980, total damage surpasses nearly $3tn.
In the UK, 2024 heatwaves resulted in 1,311 excess deaths.
A London Economics study, external estimated that the Met Office generates £56bn in economic benefits for the UK over a decade through meteorological services.
Extrapolating this globally, with a growing population facing increasingly extreme weather due to climate change, weather forecasting is a major industry.
Managing Heatwaves
Traditional forecasts rely on some of the world’s most powerful supercomputers; the Met Office’s supercomputing contract is valued at £1.2bn.
This investment provides a machine capable of 60 quadrillion (60,000,000,000,000,000) calculations per second, running a physics-based model with over a million lines of code and utilizing 215 billion weather observations.
Global models process data in a grid of boxes across the planet. Resolution varies, ranging from approximately 10sq km to 28sq km (3.86sq miles to 10.81sq miles).
This resolution limits accurate shower prediction, and mountains appear less pronounced than in reality.
The Met Office’s highest-resolution model, UKV (used for BBC TV’s 48-hour forecasts), predicts showers with 1.5km (0.9 mile) resolution. However, its computational demands restrict its use to the UK and Europe.
Machine-learning models are relatively recent, but their development is rapid and promising.
Traditional models require hours on expensive supercomputers, while these new models can run in under a minute on standard laptops. They bypass detailed physics, instead learning from 40 years of historical data.
Their Performance?
ECMWF (European Centre for Medium-Range Weather Forecasts) winter 2024/2025 atmospheric pressure data, external shows GraphCast (Google), AIFS (ECMWF), and Aurora (Microsoft) outperforming the traditional IFS (ECMWF) benchmark, while FourCastNet (Nvidia) and Pangu-Weather (Huawei) lagged.
Performance varies depending on the variable, and progress is rapid.
Accuracy diminishes with prediction time, reflecting the atmosphere’s chaotic nature. Ten-day forecasts, regardless of model type, lack sufficient accuracy.
Should we abandon physics-based models?
Not yet.
Machine-learning models rely on data from traditional models and use their atmospheric starting points. Without traditional models, machine-learning models would be significantly less effective.
While excellent at forecasting large-scale features like high and low pressure, they underperform traditional models at scales below 1000km.
This means crucial features like troughs and ridges might be missed, altering predictions from dry to heavy rain.
Aftermath of the Boscastle Floods, Cornwall, August 2004
Most machine learning models have 28sq km resolution, matching their training data. This means small-scale features, like showers, are likely missed, hindering the prediction of events like the Boscastle floods days in advance.
Headlines suggest superior hurricane prediction.
While some models have shown slightly improved landfall predictions, wind strength predictions—and therefore potential damage—have been weaker. This might stem from smoothing effects when analyzing 40 years of hurricane data.
The 1991 Mount Pinatubo eruption in the Philippines was the second largest of the 20th Century
AI models may struggle with rare events underrepresented in the 40-year training dataset. The 1991 Mount Pinatubo eruption, which caused a global temperature drop of up to 0.5C for two years, serves as an example.
There are also concerns about their performance in a warming world. The past climate data they are trained on will differ significantly from the future climate as greenhouse gas levels increase.
The Future in Five Years?
Professor Kirstine Dale, Met Office’s chief AI officer, envisions “traditional and AI models working together to provide hyper-localised, accurate, and rapid forecasts.”
Despite their short history, machine-learning models show immense potential given their speed, efficiency, and rapid development.
Could AI Replace Human Weather Forecasters?