numcodecs icon indicating copy to clipboard operation
numcodecs copied to clipboard

add VLenNDArray

Open sofroniewn opened this issue 6 years ago • 13 comments

This PR closes #199 by adding support for ragged nD arrays, where each array can be a different shape and dimensionality. It does this using the scheme described in #199.

It's usage is as follows:

import numcodecs
import numpy as np
x = np.array([[1, 3, 5], [[4, 3], [2, 1]], [[7, 9]]], dtype='object')
codec = numcodecs.VLenNDArray('<i4')
codec.decode(codec.encode(x))
array([array([1, 3, 5], dtype=int32), array([[4, 3], [2, 1]], dtype=int32),
    array([[7, 9]]], dtype=int32)], dtype=object)

I have not added tests yet, but will do so. I will adapt the tests from test_vlen_array.py.

Any comments on the implementation are appreciated. I'm pretty new to this code base, so may have made some wrong choices.

Oh and I also seem to have a bunch of .c files the came when I ran cythonize -a -i ./numcodecs/vlen_nd.pyx that I may or may not have wanted to change, any advice around those would be appreciated too.

TODO:

  • [x] Unit tests and/or doctests in docstrings
  • [ ] tox -e py37 passes locally
  • [ ] tox -e py27 passes locally
  • [ ] Docstrings and API docs for any new/modified user-facing classes and functions
  • [ ] Changes documented in docs/release.rst
  • [ ] tox -e docs passes locally
  • [ ] AppVeyor and Travis CI passes
  • [ ] Test coverage to 100% (Coveralls passes)

sofroniewn avatar Sep 06 '19 23:09 sofroniewn

I have now added tests in test_vlen_ndarray.py based on the tests for test_vlen_array.py

sofroniewn avatar Sep 07 '19 00:09 sofroniewn

hmm - tests are failing on from numcodecs.vlen_nd import VLenNDArray, but that works fine locally. I'm not quite sure what's going on there.

sofroniewn avatar Sep 07 '19 00:09 sofroniewn

Most tests pass now - I had to add the fixtures folder, fix the cython metadata in the .c file (which had included some paths on my computer) and I had to add a setup.py extension.

There's still one doc string test failing for py3.7 because of where the new-line gets split.

sofroniewn avatar Sep 07 '19 17:09 sofroniewn

I also want to note that when I try and run pytest -v numcodecs locally I get the following error messages which I think pertain to parts of the codebase I'm not trying to interact with and were likely due to problems with my installation - which followed the procedure described in your contributing guide (inculding running pip install -r requirements_dev.txt and python setup.py build_ext --inplace, but without the virtual environment)

____________________________________________________________________________ ERROR collecting numcodecs/tests/test_blosc.py ____________________________________________________________________________
ImportError while importing test module '/Users/nicholassofroniew/Github/numcodecs/numcodecs/tests/test_blosc.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
numcodecs/tests/test_blosc.py:12: in <module>
    from numcodecs import blosc
E   ImportError: dlopen(/Users/nicholassofroniew/Github/numcodecs/numcodecs/blosc.cpython-37m-darwin.so, 2): Symbol not found: _blosc_cbuffer_complib
E     Referenced from: /Users/nicholassofroniew/Github/numcodecs/numcodecs/blosc.cpython-37m-darwin.so
E     Expected in: flat namespace
E    in /Users/nicholassofroniew/Github/numcodecs/numcodecs/blosc.cpython-37m-darwin.so
_____________________________________________________________________________ ERROR collecting numcodecs/tests/test_lz4.py _____________________________________________________________________________
ImportError while importing test module '/Users/nicholassofroniew/Github/numcodecs/numcodecs/tests/test_lz4.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
numcodecs/tests/test_lz4.py:9: in <module>
    from numcodecs.lz4 import LZ4
E   ImportError: dlopen(/Users/nicholassofroniew/Github/numcodecs/numcodecs/lz4.cpython-37m-darwin.so, 2): Symbol not found: _LZ4_compressBound
E     Referenced from: /Users/nicholassofroniew/Github/numcodecs/numcodecs/lz4.cpython-37m-darwin.so
E     Expected in: flat namespace
E    in /Users/nicholassofroniew/Github/numcodecs/numcodecs/lz4.cpython-37m-darwin.so
____________________________________________________________________________ ERROR collecting numcodecs/tests/test_zstd.py _____________________________________________________________________________
ImportError while importing test module '/Users/nicholassofroniew/Github/numcodecs/numcodecs/tests/test_zstd.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
numcodecs/tests/test_zstd.py:9: in <module>
    from numcodecs.zstd import Zstd
E   ImportError: dlopen(/Users/nicholassofroniew/Github/numcodecs/numcodecs/zstd.cpython-37m-darwin.so, 2): Symbol not found: _ZSTD_compress
E     Referenced from: /Users/nicholassofroniew/Github/numcodecs/numcodecs/zstd.cpython-37m-darwin.so
E     Expected in: flat namespace
E    in /Users/nicholassofroniew/Github/numcodecs/numcodecs/zstd.cpython-37m-darwin.so

sofroniewn avatar Sep 07 '19 17:09 sofroniewn

@sofroniewn Thank you for your effort in trying to support ragged nD arrays. Just want to ask if there is any update on this feature?

NumesSanguis avatar Mar 16 '20 01:03 NumesSanguis

No update - maybe I can ping @jakirkham and @ryan-williams to take another look / help me get the tests passing / resolve conflicts - i'll note that I think the conflicts / test failures come from the failures of my dev environment not the code I'm trying to add

sofroniewn avatar Mar 16 '20 02:03 sofroniewn

Hi @sofroniewn, just to apologise for not looking at this sooner.

Re the conflicting .c files, those will just be due to changes in non-essential information that gets output by cython and which depends on which computer the C files where generated on. I'd suggest to just remove any changes to those C files from this PR.

alimanfoo avatar Mar 25 '20 11:03 alimanfoo

I also want to note that when I try and run pytest -v numcodecs locally I get the following error messages which I think pertain to parts of the codebase I'm not trying to interact with and were likely due to problems with my installation

These error messages are a bit odd, they suggest some problem with how the other extension modules were compiled. Afraid I haven't got anything very intelligent to suggest, other than cleaning out all the .so and .c files and trying a full build from scratch.

alimanfoo avatar Mar 25 '20 12:03 alimanfoo

@sofroniewn, by any chance, would you still have interest in finishing this PR? :)

ericpre avatar Mar 16 '22 19:03 ericpre

This idea could probably be superseded by the proposal for an awkward-zarr project for GSoC.

martindurant avatar Mar 16 '22 20:03 martindurant

So sorry both for dropping the ball on this - if there are already plans for this to be superseded please press on with those, or if you have a contributor who wants to take this over PR please take it over. Thanks!!

sofroniewn avatar Mar 22 '22 04:03 sofroniewn

@martindurant Do you have a link to that GSoC proposal? I could only find this open issue, but not sure if that's related?: https://github.com/zarr-developers/community/issues/42

NumesSanguis avatar Mar 22 '22 06:03 NumesSanguis

@NumesSanguis here's the ideas-list.md and here's the Awkward Array project details.

sanketverma1704 avatar Mar 22 '22 11:03 sanketverma1704