add VLenNDArray
This PR closes #199 by adding support for ragged nD arrays, where each array can be a different shape and dimensionality. It does this using the scheme described in #199.
It's usage is as follows:
import numcodecs
import numpy as np
x = np.array([[1, 3, 5], [[4, 3], [2, 1]], [[7, 9]]], dtype='object')
codec = numcodecs.VLenNDArray('<i4')
codec.decode(codec.encode(x))
array([array([1, 3, 5], dtype=int32), array([[4, 3], [2, 1]], dtype=int32),
array([[7, 9]]], dtype=int32)], dtype=object)
I have not added tests yet, but will do so. I will adapt the tests from test_vlen_array.py.
Any comments on the implementation are appreciated. I'm pretty new to this code base, so may have made some wrong choices.
Oh and I also seem to have a bunch of .c files the came when I ran cythonize -a -i ./numcodecs/vlen_nd.pyx that I may or may not have wanted to change, any advice around those would be appreciated too.
TODO:
- [x] Unit tests and/or doctests in docstrings
- [ ]
tox -e py37passes locally - [ ]
tox -e py27passes locally - [ ] Docstrings and API docs for any new/modified user-facing classes and functions
- [ ] Changes documented in docs/release.rst
- [ ]
tox -e docspasses locally - [ ] AppVeyor and Travis CI passes
- [ ] Test coverage to 100% (Coveralls passes)
I have now added tests in test_vlen_ndarray.py based on the tests for test_vlen_array.py
hmm - tests are failing on from numcodecs.vlen_nd import VLenNDArray, but that works fine locally. I'm not quite sure what's going on there.
Most tests pass now - I had to add the fixtures folder, fix the cython metadata in the .c file (which had included some paths on my computer) and I had to add a setup.py extension.
There's still one doc string test failing for py3.7 because of where the new-line gets split.
I also want to note that when I try and run pytest -v numcodecs locally I get the following error messages which I think pertain to parts of the codebase I'm not trying to interact with and were likely due to problems with my installation - which followed the procedure described in your contributing guide (inculding running pip install -r requirements_dev.txt and python setup.py build_ext --inplace, but without the virtual environment)
____________________________________________________________________________ ERROR collecting numcodecs/tests/test_blosc.py ____________________________________________________________________________
ImportError while importing test module '/Users/nicholassofroniew/Github/numcodecs/numcodecs/tests/test_blosc.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
numcodecs/tests/test_blosc.py:12: in <module>
from numcodecs import blosc
E ImportError: dlopen(/Users/nicholassofroniew/Github/numcodecs/numcodecs/blosc.cpython-37m-darwin.so, 2): Symbol not found: _blosc_cbuffer_complib
E Referenced from: /Users/nicholassofroniew/Github/numcodecs/numcodecs/blosc.cpython-37m-darwin.so
E Expected in: flat namespace
E in /Users/nicholassofroniew/Github/numcodecs/numcodecs/blosc.cpython-37m-darwin.so
_____________________________________________________________________________ ERROR collecting numcodecs/tests/test_lz4.py _____________________________________________________________________________
ImportError while importing test module '/Users/nicholassofroniew/Github/numcodecs/numcodecs/tests/test_lz4.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
numcodecs/tests/test_lz4.py:9: in <module>
from numcodecs.lz4 import LZ4
E ImportError: dlopen(/Users/nicholassofroniew/Github/numcodecs/numcodecs/lz4.cpython-37m-darwin.so, 2): Symbol not found: _LZ4_compressBound
E Referenced from: /Users/nicholassofroniew/Github/numcodecs/numcodecs/lz4.cpython-37m-darwin.so
E Expected in: flat namespace
E in /Users/nicholassofroniew/Github/numcodecs/numcodecs/lz4.cpython-37m-darwin.so
____________________________________________________________________________ ERROR collecting numcodecs/tests/test_zstd.py _____________________________________________________________________________
ImportError while importing test module '/Users/nicholassofroniew/Github/numcodecs/numcodecs/tests/test_zstd.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
numcodecs/tests/test_zstd.py:9: in <module>
from numcodecs.zstd import Zstd
E ImportError: dlopen(/Users/nicholassofroniew/Github/numcodecs/numcodecs/zstd.cpython-37m-darwin.so, 2): Symbol not found: _ZSTD_compress
E Referenced from: /Users/nicholassofroniew/Github/numcodecs/numcodecs/zstd.cpython-37m-darwin.so
E Expected in: flat namespace
E in /Users/nicholassofroniew/Github/numcodecs/numcodecs/zstd.cpython-37m-darwin.so
@sofroniewn Thank you for your effort in trying to support ragged nD arrays. Just want to ask if there is any update on this feature?
No update - maybe I can ping @jakirkham and @ryan-williams to take another look / help me get the tests passing / resolve conflicts - i'll note that I think the conflicts / test failures come from the failures of my dev environment not the code I'm trying to add
Hi @sofroniewn, just to apologise for not looking at this sooner.
Re the conflicting .c files, those will just be due to changes in non-essential information that gets output by cython and which depends on which computer the C files where generated on. I'd suggest to just remove any changes to those C files from this PR.
I also want to note that when I try and run
pytest -v numcodecslocally I get the following error messages which I think pertain to parts of the codebase I'm not trying to interact with and were likely due to problems with my installation
These error messages are a bit odd, they suggest some problem with how the other extension modules were compiled. Afraid I haven't got anything very intelligent to suggest, other than cleaning out all the .so and .c files and trying a full build from scratch.
@sofroniewn, by any chance, would you still have interest in finishing this PR? :)
This idea could probably be superseded by the proposal for an awkward-zarr project for GSoC.
So sorry both for dropping the ball on this - if there are already plans for this to be superseded please press on with those, or if you have a contributor who wants to take this over PR please take it over. Thanks!!
@martindurant Do you have a link to that GSoC proposal? I could only find this open issue, but not sure if that's related?: https://github.com/zarr-developers/community/issues/42
@NumesSanguis here's the ideas-list.md and here's the Awkward Array project details.