to_zarr() does not maintain time encoding when appending to an existing store
What happened?
When writing a dataset to a zarr store, the time encoding is correctly stored and retrieved. However, when appending to an existing store, it is not possible to set the encoding, and the current encoding is not applied to the appended data.
What did you expect to happen?
I expected the encoding to be consistent between the first write and the subsequent ones.
Minimal Complete Verifiable Example
import tempfile
from datetime import datetime, timedelta
import numpy as np
import xarray as xr
import zarr
with tempfile.TemporaryDirectory() as temp_dir:
storage = temp_dir
# Test parameters
base_time = datetime(year=1, month=1, day=1)
time_dtype = "datetime64[ms]"
time_unit = "milliseconds since 1970-01-01T00:00:00"
time_encoding = {
"units": time_unit,
"dtype": time_dtype,
}
print(f"Base time: {base_time}")
print(f"Time encoding: {time_encoding}")
# Write first timestep
print("\n--- Writing first timestep ---")
sim_time1 = base_time + timedelta(minutes=1)
time_coord1 = np.array([np.datetime64(sim_time1, "ms")], dtype=time_dtype)
data1 = xr.DataArray(
data=np.array([[[1.0, 2.0], [3.0, 4.0]]]),
coords={"time": time_coord1, "y": [0, 1], "x": [0, 1]},
dims=["time", "y", "x"],
name="test_var"
)
ds1 = xr.Dataset({"test_var": data1})
ds1.to_zarr(
storage,
encoding={"time": time_encoding}, # Encoding provided here
mode="w",
)
print(f"Written: {sim_time1}")
# Write second timestep using append_dim (this is where the bug occurs)
print("\n--- Writing second timestep with append_dim ---")
sim_time2 = base_time + timedelta(minutes=2)
time_coord2 = np.array([np.datetime64(sim_time2, "ms")], dtype=time_dtype)
data2 = xr.DataArray(
data=np.array([[[5.0, 6.0], [7.0, 8.0]]]),
coords={"time": time_coord2, "y": [0, 1], "x": [0, 1]},
dims=["time", "y", "x"],
name="test_var"
)
ds2 = xr.Dataset({"test_var": data2})
ds2.to_zarr(
storage,
append_dim="time",
mode="a",
# NOTE: Cannot pass encoding={"time": time_encoding} here!
# xarray raises: "variable 'time' already exists, but encoding was provided"
)
print(f"Written: {sim_time2}")
# Read back and demonstrate the bug
print("\n--- Reading back data ---")
ds_read = xr.open_zarr(storage)
print(f"Time coordinate values: {ds_read['time'].values}")
print(f"Time dtype: {ds_read['time'].dtype}")
# Expected vs actual
expected_times = [
np.datetime64(sim_time1, "ms"),
np.datetime64(sim_time2, "ms")
]
actual_times = ds_read['time'].values
print(f"\nExpected: {expected_times}")
print(f"Actual: {actual_times}")
# Check if bug is present
assert np.array_equal(expected_times, actual_times)
MVCE confirmation
- [x] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [x] Complete example — the example is self-contained, including all data and the text of any traceback.
- [x] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
- [x] New issue — a search of GitHub Issues suggests this is not a duplicate.
- [x] Recent environment — the issue occurs with the latest version of xarray and its dependencies.
Relevant log output
Base time: 0001-01-01 00:00:00
Time encoding: {'units': 'milliseconds since 1970-01-01T00:00:00', 'dtype': 'datetime64[ms]'}
--- Writing first timestep ---
/home/laurent/software/itzi/.venv/lib/python3.12/site-packages/zarr/api/asynchronous.py:228: UserWarning: Consolidated metadata is currently not part in the Zarr format 3 specification. It may not be supported by other zarr implementations and may change in the future.
warnings.warn(
Written: 0001-01-01 00:01:00
--- Writing second timestep with append_dim ---
/home/laurent/software/itzi/.venv/lib/python3.12/site-packages/zarr/api/asynchronous.py:228: UserWarning: Consolidated metadata is currently not part in the Zarr format 3 specification. It may not be supported by other zarr implementations and may change in the future.
warnings.warn(
Written: 0001-01-01 00:02:00
--- Reading back data ---
Time coordinate values: ['0001-01-01T00:01:00.000' '1970-01-01T00:00:00.000']
Time dtype: datetime64[ms]
Expected: [np.datetime64('0001-01-01T00:01:00.000'), np.datetime64('0001-01-01T00:02:00.000')]
Actual: ['0001-01-01T00:01:00.000' '1970-01-01T00:00:00.000']
Traceback (most recent call last):
File "/home/laurent/software/itzi/xarray_zarr_time.py", line 86, in <module>
assert np.array_equal(expected_times, actual_times)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError
Anything else we need to know?
No response
Environment
INSTALLED VERSIONS
commit: None python: 3.12.3 (main, Jun 18 2025, 17:59:45) [GCC 13.3.0] python-bits: 64 OS: Linux OS-release: 6.14.0-27-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.14.2 libnetcdf: 4.9.4-development
xarray: 2025.7.1 pandas: 2.3.1 numpy: 2.3.2 scipy: 1.16.1 netCDF4: 1.7.2 pydap: 3.5.5 h5netcdf: 1.6.4 h5py: 3.14.0 zarr: 3.1.1 cftime: 1.6.4.post1 nc_time_axis: None iris: None bottleneck: None dask: None distributed: None matplotlib: None cartopy: None seaborn: None numbagg: None fsspec: 2025.7.0 cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 80.9.0 pip: None conda: None pytest: 8.4.1 mypy: None IPython: None sphinx: None
Thanks for opening your first issue here at xarray! Be sure to follow the issue template! If you have an idea for a solution, we would really welcome a Pull Request with proposed changes. See the Contributing Guide for more. It may take us a while to respond here, but we really value your contribution. Contributors like you help make xarray better. Thank you!
Thank you for reporting @lrntct! I agree that this is a bug. I tried setting the encoding directly on the time variable in the data array:
import numpy as np
import xarray as xr
storage = "foo"
time1 = "2001-01-01 00:01"
time2 = "2001-01-01 00:02"
time_encoding = {
"dtype": "datetime64[ms]",
"units": "milliseconds since 1970-01-01T00:00:00",
}
data1 = xr.DataArray(
data=np.array([[[1.0, 2.0], [3.0, 4.0]]]),
coords={
"time": np.array([np.datetime64(time1, "ms")]),
"y": [0, 1],
"x": [0, 1]
},
dims=["time", "y", "x"],
name="test_var"
)
data1.time.encoding = time_encoding
ds1 = xr.Dataset({"test_var": data1})
ds1.to_zarr(
storage,
mode="w",
)
data2 = xr.DataArray(
data=np.array([[[5.0, 6.0], [7.0, 8.0]]]),
coords={"time": np.array([np.datetime64(time2, "ms")]), "y": [0, 1], "x": [0, 1]},
dims=["time", "y", "x"],
name="test_var"
)
data2.time.encoding = time_encoding
ds2 = xr.Dataset({"test_var": data2})
ds2.to_zarr(
storage,
append_dim="time",
mode="a",
)
ds_read = xr.open_zarr(storage)
print(f"Time coordinate values: {ds_read['time'].values}")
output:
Time coordinate values: ['2001-01-01T00:01:00.000' '1970-01-01T00:00:00.000']
I think the issue is that it's not clear whether it is xarray or zarr's responsibility to handle time encoding like this, but I'm not quite sure.
Thank you for looking at this @jsignell ! If this helps, my workaround for now is to write directly using zarr (only one value in the appended time coordinate):
def _zarr_append(self, store, dataset: xr.Dataset) -> None:
"""Zarr append using direct indexing."""
# Open the zarr group
z_group = zarr.open_group(store, mode="r+")
# Get the new time value
new_time = dataset["time"].values[0]
# Append time coordinate
current_time_size = z_group["time"].shape[0]
z_group["time"].resize(current_time_size + 1)
z_group["time"][current_time_size] = new_time
# Append data for each variable
for var_name, data_array in dataset.data_vars.items():
current_shape = z_group[var_name].shape
new_shape = (current_shape[0] + 1,) + current_shape[1:]
z_group[var_name].resize(new_shape)
# Use direct assignment
z_group[var_name][current_shape[0]] = data_array.values[0]
This might be related to a zarr bug where it was (still soemtimes is) ignoring the config that was passed to it when appending: https://github.com/zarr-developers/zarr-python/issues/2979
which I could see being realted the to encoding
I wonder if #9154 and #3942 are related?
Yeah I was wondering if those ones were related too, but they feel kind of different I think. In this case there is no need for xarray/zarr-python to try to figure out the right encoding for the data. It should already know it!