bloscpack
bloscpack copied to clipboard
numpy savez_compressed much smaller filesizes for small arrays
I have a few million images to save to disk and have been trying a few options out. I thought blosc/bloscpack would be well suited but I'm getting far larger image sizes than using the standard numpy savez_compressed.
My images are size (3,200,200) and dtype=float32. Typical file sizes I'm getting are:
np.savez~470knp.savez_compressed~53kblosc.pack_array~200kblosc.compress_ptr~200kbloscpack.pack_ndarray_to_file~200-400k
For a sample of 370 images this gives:
67M ./blosc_packarray
67M ./blosc_pointer
121M ./bp
19M ./npz
172M ./uncompressed
For the blosc_* methods I'm writing the packed bytes like:
with open(dest, 'wb') as f:
f.write(packed)
Is there anything I'm missing or is numpy's compression just as good as it gets for small images like these?
@lopsided thank you for asking about this. What settings are you using for Blosc and bloscpack. Maybe you need to either use a higher compression setting (like 9) and/or change the internal algorithm? I think it could be worth a shot.
@lopsided a list of settings to explore is here: https://github.com/Blosc/bloscpack#settings
If you can share the data or an anonymized variant that has similar entropy we could look into this in more detail.
Thanks for the quick reply!
I've just been using pretty much default settings:
packed = blosc.compress_ptr(
address=images.__array_interface__['data'][0],
items=images.size,
typesize=images.dtype.itemsize,
clevel=9,
shuffle=blosc.SHUFFLE
)
packed = blosc.pack_array(images)
bp.pack_ndarray_to_file(images, dest)
I've attached an example image (actually a triplet of greyscale images), saved uncompressed using np.savez. (I had to rename it to .zip to make github happy).
Thanks for the quick reply!
Thank you, it may take me a few days to tinker.
I am so sorry, but there was no space left in my schedule to look into this.