zarr-python
zarr-python copied to clipboard
Basic working FsspecStore
This works.
- We don't have any list methods, do we need them?
- We don't have a bulk delete
- No exception handling is done here, which has been a thorny issue. Shall we do the same as in v2 with expected exceptions -> KeyError? I don't see in
Store
's API what the expectation is.
Fixes https://github.com/zarr-developers/zarr-python/issues/1757
TODO:
- [ ] Add unit tests and/or doctests in docstrings
- [ ] Add docstrings and API docs for any new/modified user-facing classes and functions
- [ ] New/modified features documented in docs/tutorial.rst
- [ ] Changes documented in docs/release.rst
- [ ] GitHub Actions have all passed
- [ ] Test coverage is 100% (Codecov passes)
@jhamman @d-v-b , for discussion
Hello @martindurant! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
There are currently no PEP 8 issues detected in this Pull Request. Cheers! :beers:
Comment last updated at 2024-04-14 01:40:25 UTC
thanks for this @martindurant, I will see if I have time to play with this locally.
No exception handling is done here, which has been a thorny issue. Shall we do the same as in v2 with expected exceptions -> KeyError? I don't see in Store's API what the expectation is.
I don't think we have expectations at this point. I will try to get a distillation from the v2 issues around this topic. What do you think we should do here?
I see we have no storage tests, so don't know how to push this any further.
I will get some tests for you shortly
@martindurant - the test suite is in a better place now. Are you up for picking this up?
For testing ... I could set up a mock s3 or gcs with some pain and CI overhead. Also, I could wrap the fsspec memoryFS in async stuff, but that would not really be testing the async-ness and depends no me doing that correctly. None of the other stores are actually async, right?
@martindurant - which fsspec implementations have an async-api available?
For now, a mocked s3 backend seems like the way to go.
which fsspec implementations have an async-api
I think this is a complete list
- s3
- gcs
- azure (both blob and datalake2)
- http
- sshfs (not the builtin ssh/sftp)
- anaconda (released version is still sync)
The following can pass through async, if they wrap an async FS:
- generic
- reference
- dir/prefix
Got it. Makes sense. For now, let's target a mocked test against s3. We can add a few others as we approach a full release.
a mocked test against s3
Well, I would use the moto server as s3fs does, which is a real S3 implementation with most of the functionality of the real one (minus permissions, object lifetimes and other unimportant things).
I merged from v3 and the suggestions above. I cannot get mypy to play.
By the way, I would suggest that memoryview() and bytes() really ought to work as expected for a Buffer.
@martindurant -- thanks for pushing this forward...
I merged from v3 and the suggestions above. I cannot get mypy to play.
I'm hoping @dstansby can take a look... we can find a way!
By the way, I would suggest that memoryview() and bytes() really ought to work as expected for a Buffer.
cc @madsbk, @akshaysubr, @normanrz
I'll add moto-based s3 tests by Thursday
I put in a simple test to show it works, and will try to find time to hook in the existing StoreTests
Happy to take a look at typing, maybe once the tests are passing? Feel free to ping me again.
Are the tests failing due to dependency upgrades? Do we really need fsspec>2024
or could we relax that constraint?
Do we really need fsspec>2024
I recommend it.
I had thought these particular errors were already addressed
I had thought these particular errors were already addressed
Nope! https://github.com/zarr-developers/zarr-python/issues/1819
OK, so the problem with the tests, is that they call get/set in blocking sync code, but the store implementation works in async mode. The test instance is created in sync mode, and gets put on a dedicated fsspec IO thread; but then the async store on the main thread calls the same instance in async mode. Note that moto3 is also running on a thread, to complicate things. I am looking at it.
OK, so the problem with the tests, is that they call get/set in blocking sync code, but the store implementation works in async mode. The test instance is created in sync mode, and gets put on a dedicated fsspec IO thread; but then the async store on the main thread calls the same instance in async mode. Note that moto3 is also running on a thread, to complicate things. I am looking at it.
That's great insight, thank you. Please let me know if there's anything about the StoreTests
design that we should change to make this process simpler.
Highlighting this line, which works around tests passing offset-length for data which is b"". Python allows you to slice bytes like that (b""[1:2] == b""), but remote stores do not allow this.