Adding NMSE metric
PR Summary
This PR:
- adds a NMSE function,
geocat.comp.stats.nmse(), heavily inspired by Isla Simpson's implementation in CUPID (paper, Isla's repo, and CUPID ref all linked in docstring) - adds
test/util.pythat generates some toy data. I can move this intotest_stats.pyif we don't want to have a separate util, but it's general purpose enough it may be useful elsewhere in the future - adds testing for new
nmse()function. I'm currently doing this by comparing against a (slightly edited) copy of the implementation in CUPID to check for correctness over the non-seeded random toy data
Related Tickets & Documents
Closes #637
PR Checklist
General
- [x] PR includes a summary of changes
- [x] Link relevant issues, make one if none exist
- [x] Add a brief summary of changes to
docs/release-notes.rstin a relevant section for the upcoming release. - [x] Add appropriate labels to this PR
- [x] PR follows the Contributor's Guide
Functionality
- [x] New function(s) intended for public API added to
geocat/comp/__init__.pyfile
Testing
- [x] Update or create tests in appropriate test file
Documentation
- [x] Docstrings have been created and/or updated in accordance with Documentation Standards.
- [n/a] Internal functions have a preceding underscore (
_) and have been added todocs/internal_api/index.rst - [x] User facing functions have been added to
docs/user_api/index.rstunder their module
Okay, I've significantly beefed up the guardrails around grid arrangements and coordinate handling with cf conventions, along w/ tests.
There is some behavior around which dimensions get reduced that is present both in CUPID's implementation and ours that I would like another set of eyes on to confirm that it's the intended behavior.
Making two toy datasets, each with two temperature variables and 3 time steps, nmse returns a dataset with a variable coordinate corresponding to the two temperature variables and the original three time steps for each variable.
>>> from geocat.comp.stats import nmse
>>> from test.util import make_toy_temp_dataset
>>> a = make_toy_temp_dataset()
>>> a
<xarray.Dataset> Size: 15kB
Dimensions: (time: 3, lat: 10, lon: 30)
Coordinates:
* time (time) datetime64[ns] 24B 2023-01-01 2023-01-02 2023-01-03
* lat (lat) float64 80B -90.0 -70.0 -50.0 -30.0 ... 30.0 50.0 70.0 90.0
* lon (lon) float64 240B -180.0 -167.6 -155.2 ... 155.2 167.6 180.0
Data variables:
t (time, lat, lon) float64 7kB 8.369 10.26 4.113 ... 11.03 -0.4437
t2 (time, lat, lon) float64 7kB 27.78 16.5 12.93 ... 16.19 19.09 8.084
Attributes:
description: Sample temperature data
units: Celsius
>>> b = make_toy_temp_dataset()
>>> b
<xarray.Dataset> Size: 15kB
Dimensions: (time: 3, lat: 10, lon: 30)
Coordinates:
* time (time) datetime64[ns] 24B 2023-01-01 2023-01-02 2023-01-03
* lat (lat) float64 80B -90.0 -70.0 -50.0 -30.0 ... 30.0 50.0 70.0 90.0
* lon (lon) float64 240B -180.0 -167.6 -155.2 ... 155.2 167.6 180.0
Data variables:
t (time, lat, lon) float64 7kB 19.64 13.11 32.76 ... 25.99 20.68
t2 (time, lat, lon) float64 7kB 19.49 24.29 13.15 ... 13.79 26.21
Attributes:
description: Sample temperature data
units: Celsius
>>> nmse(a, b)
<xarray.Dataset> Size: 136B
Dimensions: (time: 3, variable: 2)
Coordinates:
* time (time) datetime64[ns] 24B 2023-01-01 2023-01-02 2023-01-03
* variable (variable) object 16B 't' 't2'
Data variables:
t (time, variable) float64 48B 2.059 2.059 2.059 2.059 1.852 1.852
t2 (time, variable) float64 48B 1.748 1.748 1.884 1.884 2.255 2.255
Attributes:
description: Normalized Mean Squared Error (NMSE) between modeled and ob...
Is this what would be expected here? Should we also be providing an option to reduce the time dimension in the nmse function?
This will be super useful for CUPiD issue 322!