VirtualiZarr
VirtualiZarr copied to clipboard
Usage Example: The national water model retrospective
Dataset:
The NOAA National Water Model Retrospective dataset contains input and output from multi-decade CONUS retrospective simulations. These simulations used meteorological input fields from meteorological retrospective datasets. The output frequency and fields available in this historical NWM dataset differ from those contained in the real-time operational NWM forecast model. https://noaa-nwm-retrospective-3-0-pds.s3.amazonaws.com/index.html
Further, this dataset allows for scientists to benchmark new methodologies against NWM 3.0 to determine future water model improvements.
How does VirtualiZarr help?
Within the each year there multiple variables, each with a single .nc file per hourly output from 1979-2023 ( ~380,000 total). Rather than looking up every single file within your query individually, virtualizarr would assist in managing this large amount of legacy data by creating a virtual zarr store per variable for fast slicing and loads.
See below for a sample output:
https://noaa-nwm-retrospective-3-0-pds.s3.amazonaws.com/CONUS/netcdf/CHRTOUT/1979
Example studies where this would be useful:
- https://essopenarchive.org/doi/full/10.22541/essoar.172736277.74497104
Existing packages which read the retrospective:
- https://rtiinternational.github.io/teehr/
Some more information/existing reads of the NWM retrospective through coiled: https://docs.coiled.io/blog/coiled-xarray.html
@taddyb thanks for opening up this issue! This could be a really cool usage example. Have you gotten a chance to try out VirtualiZarr with any of the NWM files and have you run into any issues yet?
Yup! I noticed the crs variables throw errors when using a open_virtual_dataset()
here is the code:
>>> so = dict(anon=True, default_fill_cache=False, default_cache_type="none")
>>> vds = open_virtual_dataset(
's3://noaa-nwm-retrospective-3-0-pds/CONUS/netcdf/LDASOUT/2019/201905200000.LDASOUT_DOMAIN1',
reader_options={"storage_options": so},
filetype="netCDF4"
)
and getting the following error:
Exception has occurred: TypeError
ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
File "/Users/taddbindas/projects/NGWPC/f1_trainer/src/retrospective_reader/reader.py", line 64, in read
's3://noaa-nwm-retrospective-3-0-pds/CONUS/netcdf/LDASOUT/2019/201905200000.LDASOUT_DOMAIN1',
reader_options={"storage_options": so},
filetype="netCDF4"
)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/taddbindas/projects/NGWPC/f1_trainer/src/retrospective_reader/reader.py", line 89, in <module>
read()
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
I've noticed the crs variable is using a numpy byte type that isn't supported (S1). When it's dropped the code runs (see below for the crs datatype)
>>> ds.crs
<xarray.DataArray 'crs' ()> Size: 1B
[1 values with dtype=|S1]
Attributes: (12/17)
longitude_of_prime_meridian: 0.0
standard_parallel: [30. 60.]
longitude_of_central_meridian: -97.0
latitude_of_projection_origin: 40.0
false_easting: 0.0
false_northing: 0.0
... ...
esri_pe_string: PROJCS["Lambert_Conformal_Conic",GEOGCS["...
spatial_ref: PROJCS["Lambert_Conformal_Conic",GEOGCS["...
long_name: CRS definition
GeoTransform: -2303999.17655 1000.0 0 1919999.66329 0 -...
_CoordinateAxes: y x
_CoordinateTransformType: Projection
Thanks for writing this up!
Your go-to escape hatch for this sort of thing should be to add the problematic variable to loadable_variables.