earth2mip
earth2mip copied to clipboard
HENS multicheckpoint
Integrating multiple checkpoints for ensemble inference into Earth-2 MIP
Description
This is a large pull request that includes all the changes that were incorporated to the core earth2mip package to support the work done in HENS (part i and part ii).
The 5 biggest additions to the package are:
- Incorporation of the bred vector method that we use in the above. this is a different implementation of the bred vectors that are currently in earth2mip. This also includes an implementation of generating spatiotemporally correlated noise, as used in SPPT.
- Initial condition perturbations are used twice: they are both added to and subtracted from the initial condition.
- Changing inference_ensemble to support a model named "multicheckpoint": if this model is specified, then inference_ensemble will run a loop over all model names in a ensemble with multiple SFNOs. I currently hardcoded the names of the SFNO that will need to be run with multiple checkpoints
- changing the seed logic. Currently, in earth2mip, if a seed is set in the config, then that sets. This logic is changed to support a different use case: let's say the user runs an ensemble with N members. then, the user wants to regenerate ensemble member k, to explore that member in more detail (e.g. save more output fields). with earth2mip's current seed logic, the user would have to regenerate all N members, to get the random number generator in the same state as it was during the original run. This requires lots of unnecessary computation and space.
The seed logic is changed such that:
- if no seed is set, then the code automatically sets a seed for each ensemble member
- if a seed is set, then it is assumed that the user only wants to regenerate one member, and the seed is set and the user can regenerate that member. the user will also have to set "subtract_perturbation" to be true or false, depending on whether they want the perturbation to be added to or subtracted from the initial condition.
- Incorporation of diagnostics on extreme weather (threshold-weighted CRPS, ROC curves, extreme forecast index, and reliability diagrams)
Checklist
- [ x] I am familiar with the Contributing Guidelines.
- [ x] New or existing tests cover these changes.
- [ x] The documentation is up to date with these changes.
- [ x] The CHANGELOG.md is up to date with these changes.
- [ x] An issue is linked to this pull request.
Dependencies
There are some new external data dependencies
- the amplitude of the perturbations is set according to the deterministic RMSE of SFNO. an external file with these amplitudes is necessary
- the extreme diagnostics pipeline requires precomputed values of the extreme thresholds (e.g. 99th percentile)
- calculating the extreme forecast index requires a large dataset (~1 tb) per variable on which the index is to be calculated
The first two require relatively small files, so I'd welcome input on how best to integrate them with the package.
Extreme Forecast Index is a more data-intensive calculation. Calculating this metric requires a fulll model climatology, specified according to ECMWF's M-Climate. I'm not sure how best to include that.
There are no new package dependencies.