libnpy icon indicating copy to clipboard operation
libnpy copied to clipboard

Support for reading/writing npz

Open WilliamTambellini opened this issue 3 years ago • 5 comments

Hello @llohse Would you accept a PR in order to support reading/writing npz (zip of npy files) ?

WilliamTambellini avatar Sep 15 '22 21:09 WilliamTambellini

Thanks for getting in touch and offering to contribute. Generally, I would be happy to add such functionality. However, libnpy currently is fairly compact and also does not have any external dependencies and I would like to keep it that way.

Do you see a way to add npz support without adding too much more code? As for the dependencies, it would need proper preprocessor checks so that npy support does not depend on any zip library.

In any case, feel free to open a PR so that I can have a look.

llohse avatar Sep 16 '22 15:09 llohse

Hi @llohse Tks for the reply. The only realistic way to read/write zip files is to use libz ( " -lz " on Linux) which is a standard lib available on all OS. Usually found with: find_package(ZLIB) eg https://github.com/rogersce/cnpy/blob/4e8810b1a8637695171ed346ce68f6984e585ef4/CMakeLists.txt#L12

Do you see a way to add npz support without adding too much more code?

I dont think it would be "too much" more code depending on how you define "too much": eg: https://github.com/rogersce/cnpy/blob/4e8810b1a8637695171ed346ce68f6984e585ef4/cnpy.cpp#L230

it would need proper preprocessor checks so that npy support does not depend on any zip library

Adding "ifdef HAVE_ZLIB" is ofcourse doable inorder to support npz only if zlib has been found.

feel free to open a PR so that I can have a look.

On our side, we cannot offer to prepare a PR before we are 100% sure you ll accept it so let's discuss further first. Tks.

WilliamTambellini avatar Sep 16 '22 19:09 WilliamTambellini

Hi @WilliamTambellini Thank you for the additional information and thanks again for offering your contribution. I would be glad to make this work.

libz seems "standard" enough. Yet, as far as I can see, it is a C-style library. I try to maintain a modern C++ codebase for libnpy, currently requiring C++14. Adding static analysis to the CI is high on my to-do list. The challenge will be to use zlib without regressions in terms of modern code style.

Moreover, I would like to keep libnpy as lightweight as possible and avoid forcing any hard dependencies, build system, or extra linker arguments by default. How do you feel about an ifdef WITH_NPZ? This could be defined depending on whether ZLIB is found or not by the build system or set manually if not using any build system.

The code examples look simple enough to make it work in a few 100 LOC. That would be fine for me.

What functionality/API do you have in mind? For me, the mvp should support

  • loading a specific array/variable from an npz file
  • saving a map of names and arrays into an npz file I would consider enumerating the contents as a nice extra feature. Loading all arrays from an npz file into a dynamic structure seems out of scope to me.

Regardless, I was planning to add a new API to libnpy, changing the parameters and return values to work with npy_data as defined here https://github.com/llohse/libnpy/blob/master/tests/test-read.cpp#L29. This could be used for npz support, too.

llohse avatar Sep 19 '22 14:09 llohse

Hi @llohse

  • C vs C++: there is a cpp api to play with zip archives using boost: https://stackoverflow.com/questions/61081287/build-boostiostreams-with-zlib but I doubt you d want to add a dep with boost
  • How do you feel about an ifdef WITH_NPZ? I proposed "ifdef HAVE_ZLIB" which is required to play with zip files anyway and autoset by many build systems
  • What functionality/API do you have in mind? MVP to read npz first, then later to write ?

WilliamTambellini avatar Feb 14 '23 16:02 WilliamTambellini

* C vs C++: there is a cpp api to play with zip archives using boost:
  https://stackoverflow.com/questions/61081287/build-boostiostreams-with-zlib
  but I doubt you d want to add a dep with boost

Yes, I would not want to depend on boost.

* How do you feel about an ifdef WITH_NPZ?
  I proposed "ifdef HAVE_ZLIB" which is required to play with zip files anyway and autoset by many build systems

Sounds good.

* What functionality/API do you have in mind?
  MVP to read npz first, then later to write ?

Sounds good.

llohse avatar Feb 24 '23 08:02 llohse