zarr-python icon indicating copy to clipboard operation
zarr-python copied to clipboard

[WIP] Attempt to continue LRU cache for decoded chunks

Open croth1 opened this issue 3 years ago • 9 comments

Continuation attempt of #306 - do not merge yet - ~very likely still not correct behaviour when chunks get deleted from store by #738~.

Still have very limited knowledge of the code base - will require me to dig a bit deeper to gain bit more understanding.

TODO:

  • [ ] Add unit tests and/or doctests in docstrings
  • [ ] Add docstrings and API docs for any new/modified user-facing classes and functions
  • [ ] New/modified features documented in docs/tutorial.rst
  • [ ] Changes documented in docs/release.rst
  • [ ] GitHub Actions have all passed
  • [ ] Test coverage is 100% (Codecov passes)

croth1 avatar Oct 23 '22 18:10 croth1

~Hmh, ok - the diff of the merge commit looks still a bit insane - some code from other PRs somehow leaked in/got duplicated - or~ I am just bad at interpreting the diff. ~Will investigate tomorrow what happened there.~

croth1 avatar Oct 23 '22 20:10 croth1

Kicked off the GHA workflows.

joshmoore avatar Oct 24 '22 15:10 joshmoore

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 99.99%. Comparing base (f361631) to head (16793f5). Report is 720 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff            @@
##             main    #1214    +/-   ##
========================================
  Coverage   99.99%   99.99%            
========================================
  Files          35       35            
  Lines       14136    14366   +230     
========================================
+ Hits        14135    14365   +230     
  Misses          1        1            
Files with missing lines Coverage Δ
zarr/__init__.py 100.00% <ø> (ø)
zarr/core.py 100.00% <100.00%> (ø)
zarr/creation.py 100.00% <ø> (ø)
zarr/hierarchy.py 99.78% <100.00%> (+<0.01%) :arrow_up:
zarr/storage.py 100.00% <100.00%> (ø)
zarr/tests/test_core.py 100.00% <100.00%> (ø)
zarr/tests/test_storage.py 100.00% <100.00%> (ø)

codecov[bot] avatar Oct 24 '22 15:10 codecov[bot]

Green. Are there any other commits expected from your side, @croth1?

joshmoore avatar Oct 31 '22 20:10 joshmoore

I am still not super sure whether this is the final form or I would rather re-implement it as a caching store ontop of other stores, just like the LRUStoreCache - it feels a bit more intuitive to me. Also I still want to check whether I can do write caching. If I remember correctly, this implementation is write-through. Not sure whether in combination with LRUStoreCache cached writes can be achieved. Need to do more research, but I'm a bit busy right now.

croth1 avatar Nov 01 '22 07:11 croth1

Hi, any news on when this feature will get pushed to main? It seems like a very useful feature! I want to reduce the overhead of sampling from the same chunk which is expected to be in RAM. If I'm not mistaken, right now, every time you pass an index to the zarr array, it has to decompress that index. I want to have an option to keep the entire chunk decompressed in the cache until it's discarded. Specifically, I'm using a large zarr array in my pytorch dataset module, and need to randomly iterate through samples in the same chunk before randomly picking the next chunk. Avoiding the decompression overhead is quite useful, and unfortunately I can't leave the data fully decompressed on disk, as it would be too large.

FarzanT avatar May 20 '23 03:05 FarzanT

@FarzanT @joshmoore, I have in the meantime changed my approach and would not need this anymore - hence the long silence. If that's useful for other people, we can try finishing it up as is. Would need some rebasing and performance testing, though.

IIRC, last time I checked all tests were green, although this would need a very thorough review because frequently I was not 100% sure what I was doing during the quite substantial rebase.

croth1 avatar Jun 02 '23 06:06 croth1

I'll leave @FarzanT and others to say how pressing their need is. Happy to help how I can @croth1 if you'd like to pursue this.

joshmoore avatar Jun 03 '23 15:06 joshmoore

Thank you @croth1 and @joshmoore, my primary use case has also been addressed by https://github.com/zarr-developers/zarr-python/issues/278#issuecomment-906527765, so it's not a pressing issue for me at the moment. But I'd say it would be much nicer to just flip a switch and have zarr handle this internally. If this pull request can address this, then it shouldn't be abandoned IMO!

FarzanT avatar Jun 06 '23 15:06 FarzanT

I'm going to close this as stale. Folks should feel free to reopen if there is interest in continuing this work.

jhamman avatar Oct 11 '24 23:10 jhamman