Can't concat virtual datasets without dropping CF variables
I've got some virtual datasets that I need to concatenate along the time dimension.
Following the CF conventions, the dataset variables include time_bnds and the grid_mapping variable rotated_pole.
Creating a list of virtual datasets with Virtualizarr works fine.
Here's what one virtual dataset in the list looks like:
If I drop the time_bnds and rotated_pole variables from each virtual dataset, I can concatenate with:
combined_ds = xr.concat(
ds_list,
dim="time",
coords="minimal",
compat="override",
combine_attrs="override",
)
but if I don't drop them, I get:
File /srv/conda/envs/notebook/lib/python3.13/site-packages/xarray/core/duck_array_ops.py:261, in asarray(data, xp, dtype)
258 return converted
260 if xp is np or not hasattr(xp, "astype"):
--> 261 return converted.astype(dtype)
262 else:
263 return xp.astype(converted, dtype)
File /srv/conda/envs/notebook/lib/python3.13/site-packages/virtualizarr/manifests/array.py:209, in ManifestArray.astype(self, dtype, copy)
207 """Cannot change the dtype, but needed because xarray will call this even when it's a no-op."""
208 if dtype != self.dtype:
--> 209 raise NotImplementedError()
210 else:
211 return self
NotImplementedError:
A reproducer notebook is here: https://gist.github.com/rsignell/88bd394bc76f30c6f5517da263490d9d
Am I doing anything wrong here?
I would expect that error (which could be clearer) to only be raised if you attempt to concatenate two variables with different dtypes. Are you sure that the time_bnds and rotated_pole variables have the same dtype across all the datasets in your list?
I'm not at all sure! I will check!
I checked and the time_bnds and rotated_pole variables have the same dtype across all the datasets (there are only 2 datasets in this example).
But in the process of finding out what the variable types were, I tried including time_bnds and rotated_pole in the loadable variables list, and then to my surprise, the concatenation worked:
This working notebook should be reproducible: https://gist.github.com/rsignell/11b67b82845b9cd9df84d956ed1ac901
Does this make sense?
Uh, okay, I realized that the above workflow added a time dimension to rotated_pole, which of course I didn't want:
And then when I added data_vars="minimal to the xr.concat() command to remove the time dimension from rotated_pole, I went back and discovered that if I used that parameter, I didn't need to modify the loadable_variables!
So this works, and is of course what I will use going forward:
Full notebook here: https://gist.github.com/rsignell/7e13856d2b73573fd927e0bab78cd2cd
I think this is all expected behaviour (if hidden behind many unintuitive defaults). The only bit that doesn't make sense to me is the idea that concatenating variables of the same dtype would raise the error
File /srv/conda/envs/notebook/lib/python3.13/site-packages/virtualizarr/manifests/array.py:209, in ManifestArray.astype(self, dtype, copy)
207 """Cannot change the dtype, but needed because xarray will call this even when it's a no-op."""
208 if dtype != self.dtype:
--> 209 raise NotImplementedError()
210 else:
211 return self