ZSTD compresion with dictionary causes odd errors
When activating the "use_dict" flag in an SChunk instance, storing data leads to errors.
The following code does not execute on my system:
import blosc2
import numpy as np
CHUNKSIZE = int(2**12)
NCHUNKS = 5
coptions = blosc2.cparams_dflts.copy()
coptions["codec"] = blosc2.Codec.ZSTD # this is already the default
coptions["use_dict"] = 1
_rng = np.random.default_rng()
def _make_data() -> bytes:
return _rng.random(CHUNKSIZE // 4, dtype=np.float32).tobytes()
data = [_make_data() for x in range(NCHUNKS)]
storage = blosc2.SChunk(
chunksize=CHUNKSIZE, cparams=coptions, dparams=blosc2.dparams_dflts
)
for x in data:
storage.append_data(x)
for index, x in enumerate(data):
assert storage.decompress_chunk(index) == x
Instead, it leads to the following RuntimeError:
Traceback (most recent call last):
File "/home/user/minimal_bug.py", line 26, in <module>
storage.append_data(x)
File "/home/user/env/lib/python3.9/site-packages/blosc2/schunk.py", line 298, in append_data
return super(SChunk, self).append_data(data)
File "blosc2_ext.pyx", line 1105, in blosc2.blosc2_ext.SChunk.append_data
RuntimeError: Could not append the buffer
If the above code is run with coptions["use_dict"] = 0, it executes successfully.
Do specific flags need to be set for shared dictionary compression to be successful, or does the sizing of stored data have different requirements?
python-blosc2 version: blosc2==2.3.2
python version: 3.9.18
platform: arch linux, conda based python install
This behavior persists with python 3.10 and python-blosc2 2.6.1. the corresponding line in the trace is 1110 in blosc2_ext.pyx.
I do not see any case or test in this repository where this option is activated. Is it meant to be functional in the current release?
We did not make any effort on making this functional. But a PR is always welcome.
Understood. I will look at what would be required for a PR. Has the shared dict functionality been tested in c-blosc2?
Yes, I think so: https://github.com/Blosc/c-blosc2/blob/main/tests/test_dict_schunk.c