xarray icon indicating copy to clipboard operation
xarray copied to clipboard

TypeError: Expected a BytesBytesCodec. Got <class 'numcodecs.blosc.Blosc'> instead.

Open leoniewgnr opened this issue 11 months ago • 13 comments

This code runs without any problems with zarr2, but give the following error when running with zarr3:

import pandas as pd
import numpy as np
import xarray as xr
from numcodecs.blosc import Blosc

ds = xr.Dataset(
    {"foo": (("x", "y"), np.random.rand(4, 5))},
    coords={
        "x": [10, 20, 30, 40],
        "y": pd.date_range("2000-01-01", periods=5),
        "z": ("x", list("abcd")),
    },
)

tmp_path = 'tmp.zarr'

# this works
ds.to_zarr(tmp_path, mode="w")
print('Saved to tmp.zarr')

# this does not work 
compressor = Blosc(cname="zstd", clevel=3, shuffle=2)
ds.to_zarr(tmp_path, encoding={"foo": {"compressor": compressor}}, mode="w")
print('Saved to tmp.zarr')

The error message is: TypeError: Expected a BytesBytesCodec. Got <class 'numcodecs.blosc.Blosc'> instead. The same error occurs in the documentation: https://docs.xarray.dev/en/stable/user-guide/io.html#zarr-compressors-and-filters

leoniewgnr avatar Feb 06 '25 13:02 leoniewgnr

Thanks for opening your first issue here at xarray! Be sure to follow the issue template! If you have an idea for a solution, we would really welcome a Pull Request with proposed changes. See the Contributing Guide for more. It may take us a while to respond here, but we really value your contribution. Contributors like you help make xarray better. Thank you!

welcome[bot] avatar Feb 06 '25 13:02 welcome[bot]

Thanks for raising this @leoniewgnr ! We're still hunting down all the bugs that the move to zarr 3 created.

The same error occurs in the documentation:

That's particularly weird - errors in the documentation examples are supposed to lead to errors in the CI...

TomNicholas avatar Feb 06 '25 16:02 TomNicholas

See also #9987

keewis avatar Feb 06 '25 17:02 keewis

I think this example needs to be updated for zarr-python 3. Something like this works for me:

diff --git a/doc/user-guide/io.rst b/doc/user-guide/io.rst
index 986d43ce..7f5d6e2b 100644
--- a/doc/user-guide/io.rst
+++ b/doc/user-guide/io.rst
@@ -829,10 +829,10 @@ For example:
     :okwarning:
 
     import zarr
-    from numcodecs.blosc import Blosc
+    from zarr.codecs import BloscCodec
 
-    compressor = Blosc(cname="zstd", clevel=3, shuffle=2)
-    ds.to_zarr("foo.zarr", encoding={"foo": {"compressor": compressor}})
+    compressor = BloscCodec(cname="zstd", clevel=3, shuffle="shuffle")
+    ds.to_zarr("foo.zarr", encoding={"foo": {"compressors": (compressor,)}})
 
 .. note::

(this is my best guess based on what I see in the backend tests some Zarr v3 related PRs. In this particular case, {"compressor": compressor} (without tuple) seems to also work.).

Perhaps @d-v-b can confirm this is now the proper way to specify encoders/help with this?

FedeMPouzols avatar Feb 09 '25 15:02 FedeMPouzols

that looks right, although I'm not too familiar with what ds.to_zarr is doing under the hood. The basic idea in zarr v3 is that there can be multiple codecs that transform an array after it has been flattened to a byte stream (alternately called "compressors" or "BytesBytesCodec"), hence the tuple. but we also accept a single codec, which we will wrap in a tuple.

d-v-b avatar Feb 09 '25 15:02 d-v-b

My situation with numcodecs 0.15.1 and Zarr 3.0.3 mirrors this: BytesBytesCodec is unavailable in numcodecs.abc, and even numcodecs.Blosc is rejected with TypeError: Expected a BytesBytesCodec.

fowlerovski avatar Feb 19 '25 23:02 fowlerovski

I'm running into this as well, even when using numcodecs.zarr3.Blosc or zarr.codecs.BloscCodec.

roansong avatar Feb 20 '25 13:02 roansong

@FedeMPouzols, when I tried your suggested {"compressors": (compressor,)} form (with tuple value and now plural key "compressors" instead of the older singular form), I still get the "TypeError: Expected a BytesBytesCodec" of leoniewgnr. Ta.

aurelgriesser avatar Mar 03 '25 03:03 aurelgriesser

Didn't work for me either -- here's a reproducible example notebook: https://nbviewer.org/gist/rsignell/066cc39664a0c8b7fe70be1fd7d7e0cb

rsignell avatar Mar 04 '25 14:03 rsignell

Edited, because much simpler solution below.

~~It actually seems that the error is not with the compressed data array but with the coords. Xarray's default BloscCodec (used when no compressor is specified) inherits from numcodecs.abc.Codec, while it should inherit from zarr.abc.codec.BytesBytesCodec for zarr v3 to pass the isinstance assert. It also seems that xarray compresses the coords as well by default, thus using the default compressor that incorrectly inherits from the numcodecs.abc.Codec.~~

~~The (temporary) solution I found is to use zarr.codecs.BloscCodec for the data var like you would expect, and explicitly tell xarray not to compress the coordinates like so:~~

from zarr.codecs import BloscCodec

encoding = {
    "data": {
        "compressor": BloscCodec(
            cname="zstd",
            clevel=6,
        ),
    }
}
for coord in da.coords:
    encoding[coord] = {"compressor": None}

~~If you do want to compress the coords specifying the compressor explicitly from zarr.codecs should also likely work (not tested).~~

jensdebruijn avatar Mar 14 '25 17:03 jensdebruijn

Actually, the solution is a lot simpler, the codecs should be imported from numcodecs.zarr3 and it will work. We could maybe consider giving a clear warning and solution in the error message?

from numcodecs.zarr3 import Blosc

jensdebruijn avatar Mar 21 '25 14:03 jensdebruijn

I also ran into this issue when trying to load zarr v2 Datatree and write a DataTree to a Zarr v3 store. As suggested, I tried using:

from numcodecs.zarr3 import Blosc

But this gave me the following warning:

/srv/conda/envs/notebook/lib/python3.12/site-packages/numcodecs/zarr3.py:133: UserWarning: Numcodecs codecs are not in the Zarr version 3 specification and may not be supported by other zarr implementations. super().init(**codec_config)

May be I misunderstood something, but to avoid this warning and still use a Zarr v3-compatible compressor, I switched to the last suggestion in this issue, using

from zarr.codecs import BloscCodec

Here’s what worked for me with Sentinel 1 sample data from EOPF Zarr sample service :

import xarray as xr
#S1A\_IW\_GRDH\_1SDV\_20240201T164915\_20240201T164940\_052368\_065517\_750E.SAFE  
path = (
"https://objectstore.eodc.eu:2222/e05ab01a9d56408d82ac32d69a5aae2a:sample-data/tutorial_data/"
"cpm_v253/S1A_IW_GRDH_1SDV_20240201T164915_20240201T164940_052368_065517_750E.zarr"
)
s1_grdh = xr.open_datatree(path, engine="zarr",chunks={})
#s1_grdh.to_zarr('s1_grdh_z2.zarr', zarr_format=2,mode='w')  ok
#s1_grdh.to_zarr('s1_grdh_z3.zarr', zarr_format=3,mode='w')  not ok
print(s1_grdh['/S01SIWGRD_20240201T164915_0025_A299_750E_065517_VH/measurements']['grd'].encoding['compressors'])
from pathlib import PurePosixPath
#from numcodecs.zarr3 import Blosc
#compressor = Blosc(cname="zstd", clevel=3,shuffle=2, blocksize=0 )  #warning
from zarr.codecs import BloscCodec
compressor = BloscCodec(cname="zstd", clevel=3,shuffle='bitshuffle', blocksize=0 )

encoding = {}

for node in s1_grdh.subtree:
    if node.ds is not None:
        group = str(node.path) if node.path != PurePosixPath("") else "."
        encoding[group] = {}
        for var in node.ds.data_vars:
            encoding[group][var] = {"compressors": [compressor]}
        for coord in node.ds.coords:
            encoding[group][coord] = {"compressors": [compressor]}
s1_grdh.to_zarr("s1_grdh_v3.zarr", zarr_format=3, encoding=encoding, mode="w")

This worked for me with only warnings on consolidated metadata issue and produced a valid Zarr v3 store. Anyone has suggestions to avoid this messy re-write of encoding for datatree??

P.s I used zarr: 3.0.6 numcodecs: 0.15.1 xarray: 2025.3.0

P.P.s and thank you xarray developers to make datatree works with zarr3!!

tinaok avatar Mar 22 '25 09:03 tinaok

So when I've run into this issue (TypeError: Expected a BytesBytesCodec. Got <class 'numcodecs.blosc.Blosc'> instead. error when saving stores in zarr 3 based on data originally read from zarr 2 stores), the following workaround has worked for me:

ds = ds.drop_encoding()

ds.to_zarr(...)

(my workflows are thus far agnostic to the specific codecs used, so this has definitely been sufficient)

ks905383 avatar Dec 10 '25 16:12 ks905383