python-blosc icon indicating copy to clipboard operation
python-blosc copied to clipboard

Concatenate two blosc compressed bytes objects

Open marchinidavide opened this issue 2 years ago • 2 comments

Hi, I am trying to concatenate two block compressed bytes object with the following code. I wonder if a function like blosc_magic_concat already exists or if it's something I can implement?


# python version = 3.10
# blosc version = 1.10.2
# numpy version = 1.22.4

import blosc
import numpy as np

blosc.set_nthreads(1)
rng = np.random.default_rng(seed=1)

a1 = rng.standard_normal(4)
a2 = rng.standard_normal(5)
a = np.concatenate((a1, a2))
b1 = a1.tobytes(order="C")
b2 = a2.tobytes(order="C")
b = b"".join((b1, b2))
assert np.all(a == np.frombuffer(b, "float64"))

c1 = blosc.compress(b1)
c2 = blosc.compress(b2)
c = blosc_magic_concat(c1, c2)  # need to implement this function such that following assertion is true
cd = blosc.decompress(c)
assert np.all(a == np.frombuffer(cd, "float64"))

blosc.decompress(b"".join((c1, c2)))  # error: Error 104 : not a Blosc buffer or header info is corrupted

marchinidavide avatar Jun 20 '22 21:06 marchinidavide

This is already implemented in python-blosc2 :-) Not only chunk concatenation, but also deletions and effective insertions. See this example: https://github.com/Blosc/python-blosc2/blob/main/examples/schunk.py

FrancescAlted avatar Jun 21 '22 05:06 FrancescAlted

Thank you! Will give it a try!

marchinidavide avatar Jun 21 '22 09:06 marchinidavide