Be able to override calendar in `open_dataset`/`open_mfdataset`/etc OR include another calendar name
Is your feature request related to a problem?
I think there was a version of ROMS in which the calendar was written as "gregorian_proleptic" instead of "proleptic_gregorian". Only the latter is checked for by xarray for valid calendar names. This unfortunately keeps coming up when I need to deal with model output from such ROMS simulations. I personally am using catalogs to access model output (e.g., intake, stac), making it so I need to be able to provide flags to the open_* command in order to be able to read in the model output from the catalog (i.e. rather than being able to run a command afterward to, say, overwrite the calendar with the correct name).
Describe the solution you'd like
I would like to either:
- include "gregorian_proleptic" on the known list of calendars, or
- be able to provide a keyword argument to the "open_*" commands to declare the calendar I want to use, overwriting what is in the file metadata.
Describe alternatives you've considered
I have used decode_times=False in this situation before and then sort of forced the datetimes into submission, but that solution won't work with a catalog setup in which all the necessary keywords to open the file(s) need to be in catalog entry.
Additional context
This code demonstrates the issue:
import xarray as xr
url = 'https://www.ncei.noaa.gov/thredds/dodsC/model-cbofs-files/2020/02/nos.cbofs.fields.n006.20200208.t18z.nc'
ds = xr.open_dataset(url, drop_variables=['dstart'])
This sounds like it could theoretically be handled using intake derived datasets. To be fair, derived datasets are probably still in their early stages. But the basic idea would be to apply arbitrary transformations to a dataset after it has been opened (e.g. with decode_cf=False) and represent the outcome of this transformation as an entry in the catalog. A suitable transformation function might be something like:
def fix_calendar(ds):
ds.time.calendar = "proleptic_gregorian"
return xr.decode_cf(ds)
... but maybe it is still more convenient or useful to handle it in xarray directly (e.g. I don't know if stac has a similar approach).
Thanks @d70-t for the idea! I haven't tried out the derived datasets capabilities in intake, but I'll give them a try. Sounds like they could be pretty powerful.
You could do the same correction with the preprocess kwarg to open_mfdataset (which can handle a single file). But if intake only uses open_dataset we could consider adding preprocess to open_dataset
What?! Whoa I did not know about the preprocess option and it looks really powerful! I have been getting the derived datasets to work but I think this would do the job in a more simple and easy-to-understand way. I will give it a try.
intake-xarray should now work with open_mfdataset — I added this as an option, though it's probably not in a release yet.
@kthyng I've hit a similar error, I just want to impose a calendar attribute on the time coordinate, but it seems to be a significant pain to get there
def setCalendar(ds):
# ds.time.calendar = "standard"
# ds["time"].calendar
ds.time.attrs['calendar'] = 'standard'
return xr.decode_cf(ds)
ds = xr.open_mfdataset(filesPath, preprocess=setCalendar)
Gives me
2023-11-22 15:20:16,878 [WARNING]: dataset.py(_is_decodable:670) >> 'time' does not have a 'units' attribute set so it could not be decoded. Try setting the 'units' attribute (`ds.{coords.name}.attrs['units']`) and try decoding again.
2023-11-22 15:20:16,878 [WARNING]: dataset.py(_is_decodable:670) >> 'time' does not have a 'units' attribute set so it could not be decoded. Try setting the 'units' attribute (`ds.{coords.name}.attrs['units']`) and try decoding again.
So it seems to have blasted all the existing attributes, which are fairly well described already:
standard_name : time
axis : T
long_name : nominal time of L4 analysis
coverage_content_type : coordinate
Having a open_mfdataset(filePath, set_cf_calendar="standard") would seem to be a great simple fix to get around my (and presumably your) issue
@durack1 Yep agreed!