Ryan Abernathey comments

Results 1187 comments of


                                            Ryan Abernathey

When I repeatedly append data to a zarr array in s3, appending takes longer and longer and longer

My hunch would be that this is related to s3fs directory listing and caching. @martindurant any thoughts?

When I repeatedly append data to a zarr array in s3, appending takes longer and longer and longer

An alternative to appending, if you know the final size of the array, would be to pre-allocated a large empty array and just write to the desired region. You could...

ProcessSynchronizer not write safe

If you are using a Dask distributed cluster, you can use a [distributed Lock](http://distributed.dask.org/en/stable/api.html#distributed.Lock) to solve this problem. Communication between processes is handled by the Dask scheduler. In the Pangeo...

numba njit support

> I think the biggest use case here working with very large amounts of data as quickly as possible. This is generally what everyone wants to do. But I'm not...

Refactor storage around abstract file system?

Yes @martindurant's [filesystem_spec](https://github.com/martindurant/filesystem_spec/) is what turned me on to pyfilesytem! (They are discussing similarities here: https://github.com/martindurant/filesystem_spec/issues/5) I don't particularly care _which_ abstract filesystem we pick--it's the principle of outsourcing this...

Refactor storage around abstract file system?

@martindurant -- thanks for the clarifications. I misunderstood the thread over in filesystem_spec discussing the relationship with pyfilesystem. I thought they were more similar than they really are, and that...

Caterva inside Zarr

I started playing with this today. As a first step, I am just trying to implement encoding / decoding of numpy data into caterva, as needed by numcodecs. But immediately...

Caterva inside Zarr

Thanks for the tips Francesc. It sounds like we will probably have to create a cython wrapper for Caterva in numcodecs, similar to what we currently [do with Blosc](https://github.com/zarr-developers/numcodecs/blob/master/numcodecs/blosc.pyx). Understanding...

Add Sharding Support

> It is also a shame that you don't find the adoption of blosc2/caterva appealing because IMO it currently fulfills most of the requirements above This is absolutely not true...

Enhancement: Chunk level access api / indexing by chunk rather than voxel

> in situations where it would make sense to perform operations on individual chunks in parallel. In python, we currently use dask for these use cases. (Zarr was created with...