zarr-python icon indicating copy to clipboard operation
zarr-python copied to clipboard

Basic working FsspecStore

Open martindurant opened this issue 10 months ago • 5 comments

This works.

  • We don't have any list methods, do we need them?
  • We don't have a bulk delete
  • No exception handling is done here, which has been a thorny issue. Shall we do the same as in v2 with expected exceptions -> KeyError? I don't see in Store's API what the expectation is.

Fixes https://github.com/zarr-developers/zarr-python/issues/1757

TODO:

  • [ ] Add unit tests and/or doctests in docstrings
  • [ ] Add docstrings and API docs for any new/modified user-facing classes and functions
  • [ ] New/modified features documented in docs/tutorial.rst
  • [ ] Changes documented in docs/release.rst
  • [ ] GitHub Actions have all passed
  • [ ] Test coverage is 100% (Codecov passes)

martindurant avatar Apr 10 '24 20:04 martindurant

@jhamman @d-v-b , for discussion

martindurant avatar Apr 10 '24 20:04 martindurant

Hello @martindurant! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! :beers:

Comment last updated at 2024-04-14 01:40:25 UTC

pep8speaks avatar Apr 11 '24 19:04 pep8speaks

thanks for this @martindurant, I will see if I have time to play with this locally.

No exception handling is done here, which has been a thorny issue. Shall we do the same as in v2 with expected exceptions -> KeyError? I don't see in Store's API what the expectation is.

I don't think we have expectations at this point. I will try to get a distillation from the v2 issues around this topic. What do you think we should do here?

d-v-b avatar Apr 11 '24 20:04 d-v-b

I see we have no storage tests, so don't know how to push this any further.

martindurant avatar Apr 14 '24 01:04 martindurant

I will get some tests for you shortly

d-v-b avatar Apr 14 '24 08:04 d-v-b

@martindurant - the test suite is in a better place now. Are you up for picking this up?

jhamman avatar May 29 '24 03:05 jhamman

For testing ... I could set up a mock s3 or gcs with some pain and CI overhead. Also, I could wrap the fsspec memoryFS in async stuff, but that would not really be testing the async-ness and depends no me doing that correctly. None of the other stores are actually async, right?

martindurant avatar May 29 '24 20:05 martindurant

@martindurant - which fsspec implementations have an async-api available?

For now, a mocked s3 backend seems like the way to go.

jhamman avatar May 29 '24 21:05 jhamman

which fsspec implementations have an async-api

I think this is a complete list

  • s3
  • gcs
  • azure (both blob and datalake2)
  • http
  • sshfs (not the builtin ssh/sftp)
  • anaconda (released version is still sync)

The following can pass through async, if they wrap an async FS:

  • generic
  • reference
  • dir/prefix

martindurant avatar May 30 '24 00:05 martindurant

Got it. Makes sense. For now, let's target a mocked test against s3. We can add a few others as we approach a full release.

jhamman avatar May 30 '24 06:05 jhamman

a mocked test against s3

Well, I would use the moto server as s3fs does, which is a real S3 implementation with most of the functionality of the real one (minus permissions, object lifetimes and other unimportant things).

martindurant avatar May 30 '24 13:05 martindurant

I merged from v3 and the suggestions above. I cannot get mypy to play.

By the way, I would suggest that memoryview() and bytes() really ought to work as expected for a Buffer.

martindurant avatar Jun 04 '24 01:06 martindurant

@martindurant -- thanks for pushing this forward...

I merged from v3 and the suggestions above. I cannot get mypy to play.

I'm hoping @dstansby can take a look... we can find a way!

By the way, I would suggest that memoryview() and bytes() really ought to work as expected for a Buffer.

cc @madsbk, @akshaysubr, @normanrz

jhamman avatar Jun 04 '24 14:06 jhamman

I'll add moto-based s3 tests by Thursday

martindurant avatar Jun 04 '24 14:06 martindurant

I put in a simple test to show it works, and will try to find time to hook in the existing StoreTests

martindurant avatar Jun 04 '24 19:06 martindurant

Happy to take a look at typing, maybe once the tests are passing? Feel free to ping me again.

dstansby avatar Jun 05 '24 16:06 dstansby

Are the tests failing due to dependency upgrades? Do we really need fsspec>2024 or could we relax that constraint?

normanrz avatar Jun 05 '24 16:06 normanrz

Do we really need fsspec>2024

I recommend it.

I had thought these particular errors were already addressed

martindurant avatar Jun 05 '24 16:06 martindurant

I had thought these particular errors were already addressed

Nope! https://github.com/zarr-developers/zarr-python/issues/1819

jhamman avatar Jun 07 '24 16:06 jhamman

OK, so the problem with the tests, is that they call get/set in blocking sync code, but the store implementation works in async mode. The test instance is created in sync mode, and gets put on a dedicated fsspec IO thread; but then the async store on the main thread calls the same instance in async mode. Note that moto3 is also running on a thread, to complicate things. I am looking at it.

martindurant avatar Jun 10 '24 18:06 martindurant

OK, so the problem with the tests, is that they call get/set in blocking sync code, but the store implementation works in async mode. The test instance is created in sync mode, and gets put on a dedicated fsspec IO thread; but then the async store on the main thread calls the same instance in async mode. Note that moto3 is also running on a thread, to complicate things. I am looking at it.

That's great insight, thank you. Please let me know if there's anything about the StoreTests design that we should change to make this process simpler.

d-v-b avatar Jun 10 '24 19:06 d-v-b

Highlighting this line, which works around tests passing offset-length for data which is b"". Python allows you to slice bytes like that (b""[1:2] == b""), but remote stores do not allow this.

martindurant avatar Jun 11 '24 02:06 martindurant