python-blosc icon indicating copy to clipboard operation
python-blosc copied to clipboard

Blosc 1.19.0 deprecates support for 'snappy'?

Open aEgoist opened this issue 5 years ago • 3 comments

I'm loading an hdf file wrote under pandas.to_hdf(...,complib="blosc:snappy") in python3.7 installed by anaconda after I upgraded anaconda to py3.8, it shows

[ HDF5ExtError: HDF5 error back trace

File "C:\ci\hdf5_1545244154871\work\src\H5Dio.c", line 199, in H5Dread can't read data File "C:\ci\hdf5_1545244154871\work\src\H5Dio.c", line 601, in H5D__read can't read data File "C:\ci\hdf5_1545244154871\work\src\H5Dchunk.c", line 2229, in H5D__chunk_read unable to read raw data chunk File "C:\ci\hdf5_1545244154871\work\src\H5Dchunk.c", line 3609, in H5D__chunk_lock data pipeline read failed File "C:\ci\hdf5_1545244154871\work\src\H5Z.c", line 1326, in H5Z_pipeline filter returned failure during read File "hdf5-blosc/src/blosc_filter.c", line 188, in blosc_filter this Blosc library does not have support for the 'snappy' compressor, but only for: blosclz,lz4,lz4hc,zlib,zstd

End of HDF5 error back trace

Problems reading the array data. ](url) seems like Blosc 1.19.0 deprecates support for 'snappy'?

aEgoist avatar Sep 02 '20 03:09 aEgoist

Well, what it happens is that the default is not to compile snappy, but it can be activated via cmake -D DEACTIVATE_SNAPPY=OFF again. It should be a matter of recompilation.

What happens with Snappy is that it is the only C++ codec integrated on C-Blosc, and we plan to get rid of it long term (e.g. C-Blosc2 will not vendor Snappy anymore) so as to easy our build process as much as possible.

FrancescAlted avatar Sep 02 '20 15:09 FrancescAlted

There is plenty of data files encoded with Blosc(w/ Snappy) out there. Making this sort of change will break many applications that either use Blosc or a derivative like pytables. @FrancescAlted, it sounds like a very high price to pay for simplifying a build process.

alobbs avatar Sep 29 '20 00:09 alobbs

@alobbs I understand your concerns. For what is worth, I tried to vendor more modern Snappy than 1.1.1, but after that version they changed the sources quite significantly. If I remember correctly, there was needed a configuration step per every platform, making the installation significantly more complex, so this is another reason why we don't vendor it anymore.

If recompiling with cmake -D DEACTIVATE_SNAPPY=OFF and having the Snappy library available in your system is not enough for you (probably because you trust in wheels or other binary packaging), I can have another look at this (but don't hold your breath on this), but really, we would much appreciate if somebody can contribute a way for vendoring again a recent snappy and not making the installation procedure much harder.

At any rate, as Snappy is definitely being deprecated in C-Blosc2, make you a favor and convert your blosc:snappy files into any other codec (I'd recommend using LZ4, which is well-maintained and with excellent compression and performance capabilities). In case your files have been created with PyTables, you can use the ptrepack --complib blosc:lz4 snappy_file lz4_file utility for easily doing the conversion.

FrancescAlted avatar Sep 30 '20 15:09 FrancescAlted