strax
strax copied to clipboard
Distributed read-write oriented storage options
Whats the problem?
The most developed storage option is the directory storage but its not designed for distributed access.
Proposed solution
For the backend side there are a few interesting options:
- Use fsspec to abstract away file-system access.
- Delegate storage management to a package with a focus on distributed access such as zarr (high level) or partd (low level).
- Switch to a Mapping interface using a combination of zict interfaces and fsspec mappers to pipe data from and to arbitrary destinations from a consistent api.
- The backend should ideally support async, thread and process safe options for locking.
For frontend improvement: I think switching to a distributed index over our data would help a lot. In a distributed index you can have many copies of the index each being modified locally and then you define merge strategies for when they finally push or pull changes, the simplest being that only changes that are not overlapping can be merged. This would be very similar to a git repository where you can have many branches but there is usually one master branch that most people pull from and only authorized people push to.