kerchunk icon indicating copy to clipboard operation
kerchunk copied to clipboard

How to avoid variables with no dimensions from picking up concat_dim?

Open rsignell-usgs opened this issue 2 years ago • 9 comments

@martindurant , I've got a collection of netcdf files I'm combining with kerchunk along the ocean_time dimension.

These files have model constants stored as single value variables without a dimension.

2022-10-21_11-28-36

When I combine them, these variables are picking up the ocean_time dimension, but they should not:

2022-10-21_11-28-59

How can I avoid this?

Reproducible notebook here: https://nbviewer.org/gist/e0c74a4a7e947e5d04fa3e82147ff146

rsignell-usgs avatar Oct 21 '22 15:10 rsignell-usgs

Is the ocean_time actually the same in all of these?

martindurant avatar Oct 21 '22 15:10 martindurant

These variables should not have an ocean_time dimension at all. But they are acquiring one through the concat process.

rsignell-usgs avatar Oct 21 '22 15:10 rsignell-usgs

How should the output look?

martindurant avatar Oct 21 '22 15:10 martindurant

The combined dataset should leave single value constants as constants, with no dimensions. At least in this case. Because the constants are same in every file.

I guess there could be cases where the constants change in every file and you would want them to have a time dimension.

But that's not the case here.

rsignell-usgs avatar Oct 21 '22 15:10 rsignell-usgs

OK, so sounds like "dstart" should be in the "identical_dims"

martindurant avatar Oct 21 '22 15:10 martindurant

I added "dstart" to the "identical_dims" but got the same result. Perhaps because "dstart" doesn't have dims?

rsignell-usgs avatar Oct 21 '22 15:10 rsignell-usgs

I can't look into it right now, but ping me next week. So are all of the values of dstart the same? If is a plausible dimension for future expansion with more variables (i.e., maybe it should be a concat_coord?)

martindurant avatar Oct 22 '22 12:10 martindurant

By adding "dstart" to identical_dims in MultiZarrToZarr, I got the result you were after

mzz = MultiZarrToZarr(json_list,
        remote_protocol = 's3',
        remote_options = opts,
        target_options = opts,
        concat_dims = ['ocean_time'],
        identical_dims=['lat_psi','lat_rho','lat_u','lat_v',
                        'lon_psi','lon_rho','lon_u','lon_v', "dstart"])

->

<xarray.DataArray 'dstart' ()>
array('2022-07-29T00:00:00.000000000', dtype='datetime64[ns]')
Attributes:
    long_name:  time stamp assigned to model initilization

martindurant avatar Oct 25 '22 17:10 martindurant

close?

martindurant avatar Oct 31 '22 15:10 martindurant