geocat-comp icon indicating copy to clipboard operation
geocat-comp copied to clipboard

Adding NMSE metric

Open anissa111 opened this issue 1 month ago • 1 comments

PR Summary

This PR:

  • adds a NMSE function, geocat.comp.stats.nmse(), heavily inspired by Isla Simpson's implementation in CUPID (paper, Isla's repo, and CUPID ref all linked in docstring)
  • adds test/util.py that generates some toy data. I can move this into test_stats.py if we don't want to have a separate util, but it's general purpose enough it may be useful elsewhere in the future
  • adds testing for new nmse() function. I'm currently doing this by comparing against a (slightly edited) copy of the implementation in CUPID to check for correctness over the non-seeded random toy data

Related Tickets & Documents

Closes #637

PR Checklist

General

  • [x] PR includes a summary of changes
  • [x] Link relevant issues, make one if none exist
  • [x] Add a brief summary of changes to docs/release-notes.rst in a relevant section for the upcoming release.
  • [x] Add appropriate labels to this PR
  • [x] PR follows the Contributor's Guide

Functionality

  • [x] New function(s) intended for public API added to geocat/comp/__init__.py file

Testing

  • [x] Update or create tests in appropriate test file

Documentation

  • [x] Docstrings have been created and/or updated in accordance with Documentation Standards.
  • [n/a] Internal functions have a preceding underscore (_) and have been added to docs/internal_api/index.rst
  • [x] User facing functions have been added to docs/user_api/index.rst under their module

anissa111 avatar Nov 18 '25 22:11 anissa111

Okay, I've significantly beefed up the guardrails around grid arrangements and coordinate handling with cf conventions, along w/ tests.

There is some behavior around which dimensions get reduced that is present both in CUPID's implementation and ours that I would like another set of eyes on to confirm that it's the intended behavior.

Making two toy datasets, each with two temperature variables and 3 time steps, nmse returns a dataset with a variable coordinate corresponding to the two temperature variables and the original three time steps for each variable.

>>> from geocat.comp.stats import nmse
>>> from test.util import make_toy_temp_dataset
>>> a = make_toy_temp_dataset()
>>> a
<xarray.Dataset> Size: 15kB
Dimensions:  (time: 3, lat: 10, lon: 30)
Coordinates:
  * time     (time) datetime64[ns] 24B 2023-01-01 2023-01-02 2023-01-03
  * lat      (lat) float64 80B -90.0 -70.0 -50.0 -30.0 ... 30.0 50.0 70.0 90.0
  * lon      (lon) float64 240B -180.0 -167.6 -155.2 ... 155.2 167.6 180.0
Data variables:
    t        (time, lat, lon) float64 7kB 8.369 10.26 4.113 ... 11.03 -0.4437
    t2       (time, lat, lon) float64 7kB 27.78 16.5 12.93 ... 16.19 19.09 8.084
Attributes:
    description:  Sample temperature data
    units:        Celsius
>>> b = make_toy_temp_dataset()
>>> b
<xarray.Dataset> Size: 15kB
Dimensions:  (time: 3, lat: 10, lon: 30)
Coordinates:
  * time     (time) datetime64[ns] 24B 2023-01-01 2023-01-02 2023-01-03
  * lat      (lat) float64 80B -90.0 -70.0 -50.0 -30.0 ... 30.0 50.0 70.0 90.0
  * lon      (lon) float64 240B -180.0 -167.6 -155.2 ... 155.2 167.6 180.0
Data variables:
    t        (time, lat, lon) float64 7kB 19.64 13.11 32.76 ... 25.99 20.68
    t2       (time, lat, lon) float64 7kB 19.49 24.29 13.15 ... 13.79 26.21
Attributes:
    description:  Sample temperature data
    units:        Celsius
>>> nmse(a, b)
<xarray.Dataset> Size: 136B
Dimensions:   (time: 3, variable: 2)
Coordinates:
  * time      (time) datetime64[ns] 24B 2023-01-01 2023-01-02 2023-01-03
  * variable  (variable) object 16B 't' 't2'
Data variables:
    t         (time, variable) float64 48B 2.059 2.059 2.059 2.059 1.852 1.852
    t2        (time, variable) float64 48B 1.748 1.748 1.884 1.884 2.255 2.255
Attributes:
    description:  Normalized Mean Squared Error (NMSE) between modeled and ob...

Is this what would be expected here? Should we also be providing an option to reduce the time dimension in the nmse function?

anissa111 avatar Nov 26 '25 20:11 anissa111

This will be super useful for CUPiD issue 322!

TeaganKing avatar Dec 02 '25 18:12 TeaganKing