numcodecs icon indicating copy to clipboard operation
numcodecs copied to clipboard

How to unbundle blosc, zstd, lz4

Open bnavigator opened this issue 4 years ago • 6 comments

From a distribution point of view, it is undesirable to bundle outdated standard system libraries into packages. Here is a quick guide how to unbundle blosc, zstd and lz4 from numcodecs but use the system libraries instead:

  1. delete c-blosc subdirectory
  2. Patch setup.py to link the extensions against the libraries:
Index: numcodecs-0.7.2/setup.py
===================================================================
--- numcodecs-0.7.2.orig/setup.py
+++ numcodecs-0.7.2/setup.py
@@ -112,6 +112,7 @@ def blosc_extension():
         Extension('numcodecs.blosc',
                   sources=sources + blosc_sources,
                   include_dirs=include_dirs,
+                  libraries=[] if blosc_sources else ['blosc'],
                   define_macros=define_macros,
                   extra_compile_args=extra_compile_args,
                   ),
@@ -152,6 +153,7 @@ def zstd_extension():
         Extension('numcodecs.zstd',
                   sources=sources + zstd_sources,
                   include_dirs=include_dirs,
+                  libraries=[] if zstd_sources else ['zstd'],
                   define_macros=define_macros,
                   extra_compile_args=extra_compile_args,
                   ),
@@ -185,6 +187,7 @@ def lz4_extension():
         Extension('numcodecs.lz4',
                   sources=sources + lz4_sources,
                   include_dirs=include_dirs,
+                  libraries=[] if lz4_sources else ['lz4'],
                   define_macros=define_macros,
                   extra_compile_args=extra_compile_args,
                   ),

The modification to setup.py would need some more work, but maybe you want to consider adding the possibility to select the building against system libraries through some environment variable, e.g. USE_SYSTEM_LIBS=1.

This also resolves the problems reported in #215

bnavigator avatar Dec 28 '20 13:12 bnavigator

Yeah definitely understand the value of unbundling. However this was impractical until recently as there were not Blosc wheels. As that has now changed ( https://github.com/zarr-developers/numcodecs/issues/262 ), agree it makes sense to use external dependencies.

To solve this would actually go about this a bit differently. In particular would change codecs to use 3rd party libraries in the code and drop the Blosc submodule.

jakirkham avatar Dec 30 '20 15:12 jakirkham

Just to add PR ( https://github.com/zarr-developers/numcodecs/pull/274 ) started down this path

jakirkham avatar Jul 28 '21 03:07 jakirkham

Hmmmm i just got bit by this since I was trying to compile with snappy support. I guess there is some more work to do on the unbundling front.

hmaarrfk avatar Sep 29 '21 00:09 hmaarrfk

If the system blosc has snappy as available codec:

--- numcodecs-0.11.0.orig/numcodecs/tests/test_blosc.py
+++ numcodecs-0.11.0/numcodecs/tests/test_blosc.py
@@ -155,10 +155,11 @@ def test_compress_complib(use_threads):
     }
     blosc.use_threads = use_threads
     for cname in blosc.list_compressors():
-        enc = blosc.compress(arr, cname.encode(), 1, Blosc.NOSHUFFLE)
-        complib = blosc.cbuffer_complib(enc)
-        expected_complib = expected_complibs[cname]
-        assert complib == expected_complib
+        if cname in expected_complibs:
+            enc = blosc.compress(arr, cname.encode(), 1, Blosc.NOSHUFFLE)
+            complib = blosc.cbuffer_complib(enc)
+            expected_complib = expected_complibs[cname]
+            assert complib == expected_complib
     with pytest.raises(ValueError):
         # capitalized cname
         blosc.compress(arr, b'LZ4', 1)

bnavigator avatar Jan 12 '23 16:01 bnavigator

Note that the package cramjam offers a very nice simple and fast interface to several byte compression algorithms in a single static compiled extension lib (zstd, lz4, snappy, ...). It is the only compression package used by fastparquet. cramjam is not blosc, but we may be able to simplify a lot of things with it.

martindurant avatar Jan 12 '23 16:01 martindurant

Think we have had this discussion before ( https://github.com/zarr-developers/numcodecs/issues/314#issuecomment-1079336948 ). This is really not pressing since we started building Conda & wheel binaries

Think there are better ways we can spend our time in Zarr development. Though don't want to discourage others if they really want to dig in. PRs welcome

jakirkham avatar Jan 12 '23 23:01 jakirkham