numcodecs
numcodecs copied to clipboard
How to unbundle blosc, zstd, lz4
From a distribution point of view, it is undesirable to bundle outdated standard system libraries into packages. Here is a quick guide how to unbundle blosc, zstd and lz4 from numcodecs but use the system libraries instead:
- delete c-blosc subdirectory
- Patch setup.py to link the extensions against the libraries:
Index: numcodecs-0.7.2/setup.py
===================================================================
--- numcodecs-0.7.2.orig/setup.py
+++ numcodecs-0.7.2/setup.py
@@ -112,6 +112,7 @@ def blosc_extension():
Extension('numcodecs.blosc',
sources=sources + blosc_sources,
include_dirs=include_dirs,
+ libraries=[] if blosc_sources else ['blosc'],
define_macros=define_macros,
extra_compile_args=extra_compile_args,
),
@@ -152,6 +153,7 @@ def zstd_extension():
Extension('numcodecs.zstd',
sources=sources + zstd_sources,
include_dirs=include_dirs,
+ libraries=[] if zstd_sources else ['zstd'],
define_macros=define_macros,
extra_compile_args=extra_compile_args,
),
@@ -185,6 +187,7 @@ def lz4_extension():
Extension('numcodecs.lz4',
sources=sources + lz4_sources,
include_dirs=include_dirs,
+ libraries=[] if lz4_sources else ['lz4'],
define_macros=define_macros,
extra_compile_args=extra_compile_args,
),
The modification to setup.py would need some more work, but maybe you want to consider adding the possibility to select the building against system libraries through some environment variable, e.g. USE_SYSTEM_LIBS=1
.
This also resolves the problems reported in #215
Yeah definitely understand the value of unbundling. However this was impractical until recently as there were not Blosc wheels. As that has now changed ( https://github.com/zarr-developers/numcodecs/issues/262 ), agree it makes sense to use external dependencies.
To solve this would actually go about this a bit differently. In particular would change codecs to use 3rd party libraries in the code and drop the Blosc submodule.
Just to add PR ( https://github.com/zarr-developers/numcodecs/pull/274 ) started down this path
Hmmmm i just got bit by this since I was trying to compile with snappy
support. I guess there is some more work to do on the unbundling front.
If the system blosc has snappy as available codec:
--- numcodecs-0.11.0.orig/numcodecs/tests/test_blosc.py
+++ numcodecs-0.11.0/numcodecs/tests/test_blosc.py
@@ -155,10 +155,11 @@ def test_compress_complib(use_threads):
}
blosc.use_threads = use_threads
for cname in blosc.list_compressors():
- enc = blosc.compress(arr, cname.encode(), 1, Blosc.NOSHUFFLE)
- complib = blosc.cbuffer_complib(enc)
- expected_complib = expected_complibs[cname]
- assert complib == expected_complib
+ if cname in expected_complibs:
+ enc = blosc.compress(arr, cname.encode(), 1, Blosc.NOSHUFFLE)
+ complib = blosc.cbuffer_complib(enc)
+ expected_complib = expected_complibs[cname]
+ assert complib == expected_complib
with pytest.raises(ValueError):
# capitalized cname
blosc.compress(arr, b'LZ4', 1)
Note that the package cramjam
offers a very nice simple and fast interface to several byte compression algorithms in a single static compiled extension lib (zstd, lz4, snappy, ...). It is the only compression package used by fastparquet. cramjam is not blosc, but we may be able to simplify a lot of things with it.
Think we have had this discussion before ( https://github.com/zarr-developers/numcodecs/issues/314#issuecomment-1079336948 ). This is really not pressing since we started building Conda & wheel binaries
Think there are better ways we can spend our time in Zarr development. Though don't want to discourage others if they really want to dig in. PRs welcome