Ryan Abernathey
Ryan Abernathey
My hunch would be that this is related to s3fs directory listing and caching. @martindurant any thoughts?
An alternative to appending, if you know the final size of the array, would be to pre-allocated a large empty array and just write to the desired region. You could...
If you are using a Dask distributed cluster, you can use a [distributed Lock](http://distributed.dask.org/en/stable/api.html#distributed.Lock) to solve this problem. Communication between processes is handled by the Dask scheduler. In the Pangeo...
> I think the biggest use case here working with very large amounts of data as quickly as possible. This is generally what everyone wants to do. But I'm not...
Yes @martindurant's [filesystem_spec](https://github.com/martindurant/filesystem_spec/) is what turned me on to pyfilesytem! (They are discussing similarities here: https://github.com/martindurant/filesystem_spec/issues/5) I don't particularly care _which_ abstract filesystem we pick--it's the principle of outsourcing this...
@martindurant -- thanks for the clarifications. I misunderstood the thread over in filesystem_spec discussing the relationship with pyfilesystem. I thought they were more similar than they really are, and that...
I started playing with this today. As a first step, I am just trying to implement encoding / decoding of numpy data into caterva, as needed by numcodecs. But immediately...
Thanks for the tips Francesc. It sounds like we will probably have to create a cython wrapper for Caterva in numcodecs, similar to what we currently [do with Blosc](https://github.com/zarr-developers/numcodecs/blob/master/numcodecs/blosc.pyx). Understanding...
> It is also a shame that you don't find the adoption of blosc2/caterva appealing because IMO it currently fulfills most of the requirements above This is absolutely not true...
> in situations where it would make sense to perform operations on individual chunks in parallel. In python, we currently use dask for these use cases. (Zarr was created with...