zarr-python icon indicating copy to clipboard operation
zarr-python copied to clipboard

LZMA custom filter pipeline ValueError: Cannot specify filters except with FORMAT_RAW

Open rabernat opened this issue 6 years ago • 2 comments

Minimal, reproducible code sample, a copy-pastable example if possible

I am trying to create a zarr array with a custom LZMA filter pipeline, based on the example from the docs. I have actually concluded that example is broken because it doesn't actually read or write any data.

import numpy as np
import zarr
import lzma
filters = [dict(id=lzma.FILTER_DELTA, dist=4),
           dict(id=lzma.FILTER_LZMA2, preset=1)]

shape = (1, 2)
dtype = np.dtype('i4')
store = {}
# write an array with lzma compression, works fine, no errors
za = zarr.create(shape, chunks=False, store=store, dtype=dtype, 
                 compression='lzma',
                 compression_opts=dict(filters=filters))
za[:] = np.zeros(shape, dtype)

# now open it and read the data back
za2 = zarr.open(store)
za2[:]

The last line produces the error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-121-1d0a9dbc12bc> in <module>()
     13 
     14 za2 = zarr.open(store)
---> 15 za2[:]

~/.conda/envs/geo_scipy/lib/python3.6/site-packages/zarr/core.py in __getitem__(self, selection)
    570 
    571         fields, selection = pop_fields(selection)
--> 572         return self.get_basic_selection(selection, fields=fields)
    573 
    574     def get_basic_selection(self, selection=Ellipsis, out=None, fields=None):

~/.conda/envs/geo_scipy/lib/python3.6/site-packages/zarr/core.py in get_basic_selection(self, selection, out, fields)
    696         else:
    697             return self._get_basic_selection_nd(selection=selection, out=out,
--> 698                                                 fields=fields)
    699 
    700     def _get_basic_selection_zd(self, selection, out=None, fields=None):

~/.conda/envs/geo_scipy/lib/python3.6/site-packages/zarr/core.py in _get_basic_selection_nd(self, selection, out, fields)
    738         indexer = BasicIndexer(selection, self)
    739 
--> 740         return self._get_selection(indexer=indexer, out=out, fields=fields)
    741 
    742     def get_orthogonal_selection(self, selection, out=None, fields=None):

~/.conda/envs/geo_scipy/lib/python3.6/site-packages/zarr/core.py in _get_selection(self, indexer, out, fields)
   1026             # load chunk selection into output array
   1027             self._chunk_getitem(chunk_coords, chunk_selection, out, out_selection,
-> 1028                                 drop_axes=indexer.drop_axes, fields=fields)
   1029 
   1030         if out.shape:

~/.conda/envs/geo_scipy/lib/python3.6/site-packages/zarr/core.py in _chunk_getitem(self, chunk_coords, chunk_selection, out, out_selection, drop_axes, fields)
   1613 
   1614                     if self._compressor:
-> 1615                         self._compressor.decode(cdata, dest)
   1616                     else:
   1617                         chunk = ensure_ndarray(cdata).view(self._dtype)

~/.conda/envs/geo_scipy/lib/python3.6/site-packages/numcodecs/lzma.py in decode(self, buf, out)
     63 
     64             # do decompression
---> 65             dec = _lzma.decompress(buf, format=self.format, filters=self.filters)
     66 
     67             # handle destination

~/.conda/envs/geo_scipy/lib/python3.6/lzma.py in decompress(data, format, memlimit, filters)
    330     results = []
    331     while True:
--> 332         decomp = LZMADecompressor(format, memlimit, filters)
    333         try:
    334             res = decomp.decompress(data)

ValueError: Cannot specify filters except with FORMAT_RAW

Problem description

This works fine with other compressors (e.g. Blosc). I suspect there is a problem with how the LZMA codec attributes are being encoded.

Perhaps this issue belongs in numcodecs...

Version and installation information

Please provide the following:

  • Value of zarr.__version__: 2.3.1
  • Value of numcodecs.__version__: 0.6.3
  • Version of Python interpreter: 3.6
  • Operating system (Linux/Windows/Mac): Mac
  • How Zarr was installed (e.g., "using pip into virtual environment", or "using conda"): conda-forge

rabernat avatar May 03 '19 03:05 rabernat

Sorry for the trouble. Thanks for catching that.

We do test the docs in CI, but we may not have caught this issue as we weren't actually using the compressor. The actual use of the compressor seems to cause the problem as you nicely isolated in issue ( https://github.com/zarr-developers/numcodecs/issues/188 ). I think once we fix the numcodecs issue the CI build for docs here will start breaking for issues like this, which would help catch issues like this in the future.

jakirkham avatar May 03 '19 03:05 jakirkham

Just ran into this problem. It looks like the workaround proposed by @rabernat works. Would you accept a pull request that updates the documentation?

selimnairb avatar Mar 12 '24 19:03 selimnairb