Tom Augspurger comments

Results 1078 comments of


                                            Tom Augspurger

[v3] default compressor / codec pipeline

One note here that's relevant for https://github.com/zarr-developers/zarr-python/pull/2036 and https://github.com/pydata/xarray/issues/9515, the default codec can depend on the dtype of the array: ```python # zarr-python 2.18.3 >>> g = zarr.group(store={}) >>> g.create(name="b",...

[v3] reimplement / develop `Group.info` and `Array.info` properties

I can take a shot at this quick.

[v3] reimplement / develop `Group.info` and `Array.info` properties

I'd like to keep `.info` as a properly, like it was in v2. I think we can include fields requiring computation in an `info_full` or `info_complete` method.

Add some basic streaming engine documentation

[fa18b9b](https://github.com/rapidsai/cudf/pull/19088/commits/fa18b9b80dc1cb2459fdde5894a93d400d3b265e) should have all the suggestions (lightly adapted).

Codec pipeline memory usage

https://github.com/TomAugspurger/zarr-python-memory-benchmark/blob/4039ba687452d65eef081bce1d4714165546422a/sol.py#L41 has a POC for using `readinto` to read an uncompressed zarr dataset into a pre-allocated buffer. https://rawcdn.githack.com/TomAugspurger/zarr-python-memory-benchmark/3567246b852d7adacbc10f32a58b0b3f6ac3d50b/reports/memray-flamegraph-sol-read-uncompressed.html shows that that takes ~exactly the size of the output ndarray (so...

Codec pipeline memory usage

Very cool! > [from https://github.com/tomwhite/memray-array] Reads with no compression incur a single copy from local files, but two copies from S3. This seems to be because the S3 libraries read...

Codec pipeline memory usage

I looked into implementing this today and it'll be a decent amount of effort. There are some issues in the interface provided by the codec pipeline ABC (`read` takes an...

Codec pipeline memory usage

On the weekly call today, Davis asked about whether we could do zero-copy read / decompression for variable-width / length types. For fixed-size types, we can derive that as `chunk.dtype.itemsize...

Codec pipeline memory usage

Nice work. https://github.com/TomAugspurger/zarr-python/blob/tom/zero-copy-codec-pipeline/tests/test_memory_usage.py has the start of a test that uses tracemalloc to ensure no unexpected NumPy array allocations are made. This should enable us to verify we aren't unexpectedly...

Codec pipeline memory usage

https://github.com/TomAugspurger/zarr-python/blob/tom/zero-copy-alt/simple.py has an implementation I've been hacking up. Compared to the current codec pipeline it's faster and uses less memory, but isn't near feature complete yet. ![](https://github.com/TomAugspurger/zarr-python/blob/tom/zero-copy-alt/simple.png?raw=true) I'm still analyzing...