numcodecs icon indicating copy to clipboard operation
numcodecs copied to clipboard

Can Cython work with extern C API which has one parameter used as input and output buffer

Open halehawk opened this issue 2 years ago • 11 comments

This is not a bug of numcodecs, I am working on some Cython codes using some extern C APIs. But I met some problems that the C API use one parameter as input and output buffers, do you know if Cython can work with C APIs to send in input array pointer and get the output array from the same pointer? Especially using Buffer and its pointer that is defined in numcodecs @jakirkham

halehawk avatar May 31 '23 21:05 halehawk

You can certainly do this. It means, however (if I understood) that the C lib is allocating the buffer memory. You would probably make it into a buffer using fromMemory. You would need a cython class object to hold the pointer and dealloc when python collects the object. This obviously makes life more complicated and prone to crashes.

martindurant avatar Jun 01 '23 12:06 martindurant

Thank you, I didn't know I could use fromMemory. Currently, I copied the input numpy array to an output numpy array, and used the output numpy array Buffer type to call the C API. And it worked. I will try fromMemory later.

On Thu, Jun 1, 2023 at 6:57 AM Martin Durant @.***> wrote:

You can certainly do this. It means, however (if I understood) that the C lib is allocating the buffer memory. You would probably make it into a buffer using fromMemory https://docs.python.org/3/c-api/memoryview.html#c.PyMemoryView_FromMemory. You would need a cython class object to hold the pointer and dealloc https://cython.readthedocs.io/en/latest/src/userguide/special_methods.html#finalization-methods-dealloc-and-del when python collects the object. This obviously makes life more complicated and prone to crashes.

— Reply to this email directly, view it on GitHub https://github.com/zarr-developers/numcodecs/issues/441#issuecomment-1572005855, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACAPEFCQKUWJNSBMCIQXX3LXJCGVDANCNFSM6AAAAAAYWBEHDE . You are receiving this because you authored the thread.Message ID: @.***>

halehawk avatar Jun 01 '23 19:06 halehawk

Copying is, of course, the safer approach :)

martindurant avatar Jun 01 '23 19:06 martindurant

I have another question. If you can help me, that would be great. I want to add another compressor SPERR in numcodecs, so I followed the way blosc used, I added SPERR as a submodule and compiled SPERR source codes and python binding (sperr.pyx) in setup.py. I can get it to work and I can use it to compile data successfully. But I cannot get the same results as I use SPERR to compress from its C utility command. I tried to make sure all C related flags and options were the same. but I still cannot get the same compress/decompress results bit-by-bit. So I changed the way to compile, now I compile sperr.pyx only and link it to SPERR C library. Then I can get the same results bit-by-bit. Do you know how blosc did it? Or is it supposed to be different?

On Thu, Jun 1, 2023 at 1:47 PM Martin Durant @.***> wrote:

Copying is, of course, the safer approach :)

— Reply to this email directly, view it on GitHub https://github.com/zarr-developers/numcodecs/issues/441#issuecomment-1572675940, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACAPEFBV2TT5DABQLCP3VO3XJDWVBANCNFSM6AAAAAAYWBEHDE . You are receiving this because you authored the thread.Message ID: @.***>

halehawk avatar Jun 01 '23 20:06 halehawk

You mean https://github.com/NCAR/numcodecs_sperr ?

I believe the submodule tactic is complex and there have been conversations about avoiding it. There isn't a real reason for such a dependency to be included directly in numcodecs, rather than in a separate package referenced by an entrypoint.

Some codecs work by compiling the low-level C stuff into a binary and separate python bindings, which works particularly well with conda, which happily installs binary non-python dependencies. The cython part then only needs include statements.

martindurant avatar Jun 01 '23 20:06 martindurant

Yes, this is the compressor I intend to add to numcodecs. It can compress data by specified absolute error value, such as 0.01 for data that is around that range. The PSNR mode is good too. Also we have developed a log filter and other filters to use before compression, I write a Python binding for this log filter, and call it from SPERR encode, which is good for data that is in a large range. Do you think I can create a PR for this compressor in numcodecs?

On Thu, Jun 1, 2023 at 2:21 PM Martin Durant @.***> wrote:

You mean https://github.com/NCAR/numcodecs_sperr ?

I believe the submodule tactic is complex and there have been conversations about avoiding it. There isn't a real reason for such a dependency to be included directly in numcodecs, rather than in a separate package referenced by an entrypoint.

Some codecs work by compiling the low-level C stuff into a binary and separate python bindings, which works particularly well with conda, which happily installs binary non-python dependencies. The cython part then only needs include statements.

— Reply to this email directly, view it on GitHub https://github.com/zarr-developers/numcodecs/issues/441#issuecomment-1572724777, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACAPEFERQEUVXGL23AVBHJTXJD2TJANCNFSM6AAAAAAYWBEHDE . You are receiving this because you authored the thread.Message ID: @.***>

halehawk avatar Jun 01 '23 20:06 halehawk

Do you think I can create a PR for this compressor in numcodecs?

In principle it can be OK, but I think you would find resistance to adding extra compile complexity or submodule into this package directly. Is there a reason not to make it a separate package?

martindurant avatar Jun 01 '23 20:06 martindurant

I don't know how to make it a separate package officially. Since SPERR is a C/C++ package, where can I publish libSPERR.so and then connect it to sperr.pyx which will be in numcodecs?

On Thu, Jun 1, 2023 at 2:34 PM Martin Durant @.***> wrote:

Do you think I can create a PR for this compressor in numcodecs?

In principle it can be OK, but I think you would find resistance to adding extra compile complexity or submodule into this package directly. Is there a reason not to make it a separate package?

— Reply to this email directly, view it on GitHub https://github.com/zarr-developers/numcodecs/issues/441#issuecomment-1572740137, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACAPEFEQLPBG2OP6EH4IMZLXJD4DVANCNFSM6AAAAAAYWBEHDE . You are receiving this because you authored the thread.Message ID: @.***>

halehawk avatar Jun 01 '23 20:06 halehawk

Two options:

  • sperr.pyx does not need to be in numcodecs, it can live in the same package as the C code
  • you can distribute your .so in a python wheel (hard) or a conda package (easy, but not as widely used; common in scientific scenarios, though)

martindurant avatar Jun 01 '23 20:06 martindurant

On Thu, Jun 1, 2023 at 2:46 PM Martin Durant @.***> wrote:

Two options:

  • sperr.pyx does not need to be in numcodecs, it can live in the same package as the C code

How can I use it with zarr the same way as zstd or lz4?

  • you can distribute your .so in a python wheel (hard) or a conda package (easy, but not as widely used; common in scientific scenarios, though)

Currently zfp uses this way, it has two repos to maintain, one is for zfp C code, one is for zfp-wheel, how can I make it have less maintenance?

— Reply to this email directly, view it on GitHub https://github.com/zarr-developers/numcodecs/issues/441#issuecomment-1572754551, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACAPEFENVOXLY6A6PLAEJTDXJD5SJANCNFSM6AAAAAAYWBEHDE . You are receiving this because you authored the thread.Message ID: @.***>

halehawk avatar Jun 01 '23 21:06 halehawk

Perhaps check out what imagecodecs does. That includes all the C code in the repo, with bindings to expose both compress/uncompress functions and numcodecs codecs. It shows how you can call register at runtime or establish entrypoints for your codec.

martindurant avatar Jun 02 '23 00:06 martindurant