LZMA custom filter pipeline ValueError: Cannot specify filters except with FORMAT_RAW
Minimal, reproducible code sample, a copy-pastable example if possible
I am trying to create a zarr array with a custom LZMA filter pipeline, based on the example from the docs. I have actually concluded that example is broken because it doesn't actually read or write any data.
import numpy as np
import zarr
import lzma
filters = [dict(id=lzma.FILTER_DELTA, dist=4),
dict(id=lzma.FILTER_LZMA2, preset=1)]
shape = (1, 2)
dtype = np.dtype('i4')
store = {}
# write an array with lzma compression, works fine, no errors
za = zarr.create(shape, chunks=False, store=store, dtype=dtype,
compression='lzma',
compression_opts=dict(filters=filters))
za[:] = np.zeros(shape, dtype)
# now open it and read the data back
za2 = zarr.open(store)
za2[:]
The last line produces the error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-121-1d0a9dbc12bc> in <module>()
13
14 za2 = zarr.open(store)
---> 15 za2[:]
~/.conda/envs/geo_scipy/lib/python3.6/site-packages/zarr/core.py in __getitem__(self, selection)
570
571 fields, selection = pop_fields(selection)
--> 572 return self.get_basic_selection(selection, fields=fields)
573
574 def get_basic_selection(self, selection=Ellipsis, out=None, fields=None):
~/.conda/envs/geo_scipy/lib/python3.6/site-packages/zarr/core.py in get_basic_selection(self, selection, out, fields)
696 else:
697 return self._get_basic_selection_nd(selection=selection, out=out,
--> 698 fields=fields)
699
700 def _get_basic_selection_zd(self, selection, out=None, fields=None):
~/.conda/envs/geo_scipy/lib/python3.6/site-packages/zarr/core.py in _get_basic_selection_nd(self, selection, out, fields)
738 indexer = BasicIndexer(selection, self)
739
--> 740 return self._get_selection(indexer=indexer, out=out, fields=fields)
741
742 def get_orthogonal_selection(self, selection, out=None, fields=None):
~/.conda/envs/geo_scipy/lib/python3.6/site-packages/zarr/core.py in _get_selection(self, indexer, out, fields)
1026 # load chunk selection into output array
1027 self._chunk_getitem(chunk_coords, chunk_selection, out, out_selection,
-> 1028 drop_axes=indexer.drop_axes, fields=fields)
1029
1030 if out.shape:
~/.conda/envs/geo_scipy/lib/python3.6/site-packages/zarr/core.py in _chunk_getitem(self, chunk_coords, chunk_selection, out, out_selection, drop_axes, fields)
1613
1614 if self._compressor:
-> 1615 self._compressor.decode(cdata, dest)
1616 else:
1617 chunk = ensure_ndarray(cdata).view(self._dtype)
~/.conda/envs/geo_scipy/lib/python3.6/site-packages/numcodecs/lzma.py in decode(self, buf, out)
63
64 # do decompression
---> 65 dec = _lzma.decompress(buf, format=self.format, filters=self.filters)
66
67 # handle destination
~/.conda/envs/geo_scipy/lib/python3.6/lzma.py in decompress(data, format, memlimit, filters)
330 results = []
331 while True:
--> 332 decomp = LZMADecompressor(format, memlimit, filters)
333 try:
334 res = decomp.decompress(data)
ValueError: Cannot specify filters except with FORMAT_RAW
Problem description
This works fine with other compressors (e.g. Blosc). I suspect there is a problem with how the LZMA codec attributes are being encoded.
Perhaps this issue belongs in numcodecs...
Version and installation information
Please provide the following:
- Value of
zarr.__version__: 2.3.1 - Value of
numcodecs.__version__: 0.6.3 - Version of Python interpreter: 3.6
- Operating system (Linux/Windows/Mac): Mac
- How Zarr was installed (e.g., "using pip into virtual environment", or "using conda"): conda-forge
Sorry for the trouble. Thanks for catching that.
We do test the docs in CI, but we may not have caught this issue as we weren't actually using the compressor. The actual use of the compressor seems to cause the problem as you nicely isolated in issue ( https://github.com/zarr-developers/numcodecs/issues/188 ). I think once we fix the numcodecs issue the CI build for docs here will start breaking for issues like this, which would help catch issues like this in the future.
Just ran into this problem. It looks like the workaround proposed by @rabernat works. Would you accept a pull request that updates the documentation?