python-blosc2 icon indicating copy to clipboard operation
python-blosc2 copied to clipboard

ZSTD compresion with dictionary causes odd errors

Open alekepd opened this issue 1 year ago • 5 comments

When activating the "use_dict" flag in an SChunk instance, storing data leads to errors.

The following code does not execute on my system:

import blosc2
import numpy as np

CHUNKSIZE = int(2**12)
NCHUNKS = 5

coptions = blosc2.cparams_dflts.copy()
coptions["codec"] = blosc2.Codec.ZSTD # this is already the default
coptions["use_dict"] = 1


_rng = np.random.default_rng()


def _make_data() -> bytes:
    return _rng.random(CHUNKSIZE // 4, dtype=np.float32).tobytes()


data = [_make_data() for x in range(NCHUNKS)]

storage = blosc2.SChunk(
    chunksize=CHUNKSIZE, cparams=coptions, dparams=blosc2.dparams_dflts
)

for x in data:
    storage.append_data(x)

for index, x in enumerate(data):
    assert storage.decompress_chunk(index) == x

Instead, it leads to the following RuntimeError:

Traceback (most recent call last):
  File "/home/user/minimal_bug.py", line 26, in <module>
    storage.append_data(x)
  File "/home/user/env/lib/python3.9/site-packages/blosc2/schunk.py", line 298, in append_data
    return super(SChunk, self).append_data(data)
  File "blosc2_ext.pyx", line 1105, in blosc2.blosc2_ext.SChunk.append_data
RuntimeError: Could not append the buffer

If the above code is run with coptions["use_dict"] = 0, it executes successfully.

Do specific flags need to be set for shared dictionary compression to be successful, or does the sizing of stored data have different requirements?

python-blosc2 version: blosc2==2.3.2 python version: 3.9.18 platform: arch linux, conda based python install

alekepd avatar Jun 14 '24 15:06 alekepd

This behavior persists with python 3.10 and python-blosc2 2.6.1. the corresponding line in the trace is 1110 in blosc2_ext.pyx.

alekepd avatar Jun 15 '24 09:06 alekepd

I do not see any case or test in this repository where this option is activated. Is it meant to be functional in the current release?

alekepd avatar Jun 17 '24 08:06 alekepd

We did not make any effort on making this functional. But a PR is always welcome.

FrancescAlted avatar Jun 17 '24 09:06 FrancescAlted

Understood. I will look at what would be required for a PR. Has the shared dict functionality been tested in c-blosc2?

alekepd avatar Jun 17 '24 09:06 alekepd

Yes, I think so: https://github.com/Blosc/c-blosc2/blob/main/tests/test_dict_schunk.c

FrancescAlted avatar Jun 17 '24 10:06 FrancescAlted