Dataset:

The NOAA National Water Model Retrospective dataset contains input and output from multi-decade CONUS retrospective simulations. These simulations used meteorological input fields from meteorological retrospective datasets. The output frequency and fields available in this historical NWM dataset differ from those contained in the real-time operational NWM forecast model. https://noaa-nwm-retrospective-3-0-pds.s3.amazonaws.com/index.html

Further, this dataset allows for scientists to benchmark new methodologies against NWM 3.0 to determine future water model improvements.

How does VirtualiZarr help?

Within the each year there multiple variables, each with a single .nc file per hourly output from 1979-2023 ( ~380,000 total). Rather than looking up every single file within your query individually, virtualizarr would assist in managing this large amount of legacy data by creating a virtual zarr store per variable for fast slicing and loads.

See below for a sample output: https://noaa-nwm-retrospective-3-0-pds.s3.amazonaws.com/CONUS/netcdf/CHRTOUT/1979

Example studies where this would be useful:

https://essopenarchive.org/doi/full/10.22541/essoar.172736277.74497104

Existing packages which read the retrospective:

https://rtiinternational.github.io/teehr/

Feb 22 '25 16:02 taddyb

Some more information/existing reads of the NWM retrospective through coiled: https://docs.coiled.io/blog/coiled-xarray.html

Feb 22 '25 20:02 taddyb

@taddyb thanks for opening up this issue! This could be a really cool usage example. Have you gotten a chance to try out VirtualiZarr with any of the NWM files and have you run into any issues yet?

Feb 25 '25 18:02 norlandrhagen

Yup! I noticed the crs variables throw errors when using a open_virtual_dataset()

here is the code:

>>>  so = dict(anon=True, default_fill_cache=False, default_cache_type="none")
>>>  vds = open_virtual_dataset(
        's3://noaa-nwm-retrospective-3-0-pds/CONUS/netcdf/LDASOUT/2019/201905200000.LDASOUT_DOMAIN1', 
        reader_options={"storage_options": so},
        filetype="netCDF4"
    )

and getting the following error:

Exception has occurred: TypeError
ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
  File "/Users/taddbindas/projects/NGWPC/f1_trainer/src/retrospective_reader/reader.py", line 64, in read
    's3://noaa-nwm-retrospective-3-0-pds/CONUS/netcdf/LDASOUT/2019/201905200000.LDASOUT_DOMAIN1', 

        reader_options={"storage_options": so},

        filetype="netCDF4"

    )
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/taddbindas/projects/NGWPC/f1_trainer/src/retrospective_reader/reader.py", line 89, in <module>
    read()
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

I've noticed the crs variable is using a numpy byte type that isn't supported (S1). When it's dropped the code runs (see below for the crs datatype)

>>> ds.crs
<xarray.DataArray 'crs' ()> Size: 1B
[1 values with dtype=|S1]
Attributes: (12/17)
    longitude_of_prime_meridian:    0.0
    standard_parallel:              [30. 60.]
    longitude_of_central_meridian:  -97.0
    latitude_of_projection_origin:  40.0
    false_easting:                  0.0
    false_northing:                 0.0
    ...                             ...
    esri_pe_string:                 PROJCS["Lambert_Conformal_Conic",GEOGCS["...
    spatial_ref:                    PROJCS["Lambert_Conformal_Conic",GEOGCS["...
    long_name:                      CRS definition
    GeoTransform:                   -2303999.17655 1000.0 0 1919999.66329 0 -...
    _CoordinateAxes:                y x
    _CoordinateTransformType:       Projection

Feb 28 '25 01:02 taddyb

Thanks for writing this up!

Your go-to escape hatch for this sort of thing should be to add the problematic variable to loadable_variables.

Feb 28 '25 01:02 TomNicholas

VirtualiZarr
VirtualiZarr copied to clipboard

Usage Example: The national water model retrospective

Dataset:

How does VirtualiZarr help?

Example studies where this would be useful:

Existing packages which read the retrospective:

VirtualiZarr VirtualiZarr copied to clipboard

Usage Example: The national water model retrospective

Dataset:

How does VirtualiZarr help?

Example studies where this would be useful:

Existing packages which read the retrospective:

VirtualiZarr
VirtualiZarr copied to clipboard