VirtualiZarr icon indicating copy to clipboard operation
VirtualiZarr copied to clipboard

Can't concat virtual datasets without dropping CF variables

Open rsignell opened this issue 2 months ago • 5 comments

I've got some virtual datasets that I need to concatenate along the time dimension.

Following the CF conventions, the dataset variables include time_bnds and the grid_mapping variable rotated_pole.

Creating a list of virtual datasets with Virtualizarr works fine.

Here's what one virtual dataset in the list looks like:

Image

If I drop the time_bnds and rotated_pole variables from each virtual dataset, I can concatenate with:

combined_ds = xr.concat(
    ds_list,
    dim="time",
    coords="minimal",
    compat="override",
    combine_attrs="override",
)

but if I don't drop them, I get:

File /srv/conda/envs/notebook/lib/python3.13/site-packages/xarray/core/duck_array_ops.py:261, in asarray(data, xp, dtype)
    258     return converted
    260 if xp is np or not hasattr(xp, "astype"):
--> 261     return converted.astype(dtype)
    262 else:
    263     return xp.astype(converted, dtype)

File /srv/conda/envs/notebook/lib/python3.13/site-packages/virtualizarr/manifests/array.py:209, in ManifestArray.astype(self, dtype, copy)
    207 """Cannot change the dtype, but needed because xarray will call this even when it's a no-op."""
    208 if dtype != self.dtype:
--> 209     raise NotImplementedError()
    210 else:
    211     return self

NotImplementedError: 

A reproducer notebook is here: https://gist.github.com/rsignell/88bd394bc76f30c6f5517da263490d9d

Am I doing anything wrong here?

rsignell avatar Nov 06 '25 14:11 rsignell

I would expect that error (which could be clearer) to only be raised if you attempt to concatenate two variables with different dtypes. Are you sure that the time_bnds and rotated_pole variables have the same dtype across all the datasets in your list?

TomNicholas avatar Nov 06 '25 14:11 TomNicholas

I'm not at all sure! I will check!

rsignell avatar Nov 06 '25 17:11 rsignell

I checked and the time_bnds and rotated_pole variables have the same dtype across all the datasets (there are only 2 datasets in this example).

But in the process of finding out what the variable types were, I tried including time_bnds and rotated_pole in the loadable variables list, and then to my surprise, the concatenation worked:

Image

This working notebook should be reproducible: https://gist.github.com/rsignell/11b67b82845b9cd9df84d956ed1ac901

Does this make sense?

rsignell avatar Nov 07 '25 10:11 rsignell

Uh, okay, I realized that the above workflow added a time dimension to rotated_pole, which of course I didn't want:

Image

And then when I added data_vars="minimal to the xr.concat() command to remove the time dimension from rotated_pole, I went back and discovered that if I used that parameter, I didn't need to modify the loadable_variables!

So this works, and is of course what I will use going forward: Image

Full notebook here: https://gist.github.com/rsignell/7e13856d2b73573fd927e0bab78cd2cd

rsignell avatar Nov 07 '25 10:11 rsignell

I think this is all expected behaviour (if hidden behind many unintuitive defaults). The only bit that doesn't make sense to me is the idea that concatenating variables of the same dtype would raise the error

File /srv/conda/envs/notebook/lib/python3.13/site-packages/virtualizarr/manifests/array.py:209, in ManifestArray.astype(self, dtype, copy)
    207 """Cannot change the dtype, but needed because xarray will call this even when it's a no-op."""
    208 if dtype != self.dtype:
--> 209     raise NotImplementedError()
    210 else:
    211     return self

TomNicholas avatar Nov 10 '25 15:11 TomNicholas