zarr-python [WIP] Attempt to continue LRU cache for decoded chunks

Continuation attempt of #306 - do not merge yet - ~very likely still not correct behaviour when chunks get deleted from store by #738~.

Still have very limited knowledge of the code base - will require me to dig a bit deeper to gain bit more understanding.

TODO:

[ ] Add unit tests and/or doctests in docstrings
[ ] Add docstrings and API docs for any new/modified user-facing classes and functions
[ ] New/modified features documented in docs/tutorial.rst
[ ] Changes documented in docs/release.rst
[ ] GitHub Actions have all passed
[ ] Test coverage is 100% (Codecov passes)

Oct 23 '22 18:10 croth1

~Hmh, ok - the diff of the merge commit looks still a bit insane - some code from other PRs somehow leaked in/got duplicated - or~ I am just bad at interpreting the diff. ~Will investigate tomorrow what happened there.~

Oct 23 '22 20:10 croth1

Kicked off the GHA workflows.

Oct 24 '22 15:10 joshmoore

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 99.99%. Comparing base (f361631) to head (16793f5). Report is 720 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff            @@
##             main    #1214    +/-   ##
========================================
  Coverage   99.99%   99.99%            
========================================
  Files          35       35            
  Lines       14136    14366   +230     
========================================
+ Hits        14135    14365   +230     
  Misses          1        1

Files with missing lines	Coverage Δ
zarr/__init__.py	`100.00% <ø> (ø)`
zarr/core.py	`100.00% <100.00%> (ø)`
zarr/creation.py	`100.00% <ø> (ø)`
zarr/hierarchy.py	`99.78% <100.00%> (+<0.01%)`	:arrow_up:
zarr/storage.py	`100.00% <100.00%> (ø)`
zarr/tests/test_core.py	`100.00% <100.00%> (ø)`
zarr/tests/test_storage.py	`100.00% <100.00%> (ø)`

Oct 24 '22 15:10 codecov[bot]

Green. Are there any other commits expected from your side, @croth1?

Oct 31 '22 20:10 joshmoore

I am still not super sure whether this is the final form or I would rather re-implement it as a caching store ontop of other stores, just like the LRUStoreCache - it feels a bit more intuitive to me. Also I still want to check whether I can do write caching. If I remember correctly, this implementation is write-through. Not sure whether in combination with LRUStoreCache cached writes can be achieved. Need to do more research, but I'm a bit busy right now.

Nov 01 '22 07:11 croth1

Hi, any news on when this feature will get pushed to main? It seems like a very useful feature! I want to reduce the overhead of sampling from the same chunk which is expected to be in RAM. If I'm not mistaken, right now, every time you pass an index to the zarr array, it has to decompress that index. I want to have an option to keep the entire chunk decompressed in the cache until it's discarded. Specifically, I'm using a large zarr array in my pytorch dataset module, and need to randomly iterate through samples in the same chunk before randomly picking the next chunk. Avoiding the decompression overhead is quite useful, and unfortunately I can't leave the data fully decompressed on disk, as it would be too large.

May 20 '23 03:05 FarzanT

@FarzanT @joshmoore, I have in the meantime changed my approach and would not need this anymore - hence the long silence. If that's useful for other people, we can try finishing it up as is. Would need some rebasing and performance testing, though.

IIRC, last time I checked all tests were green, although this would need a very thorough review because frequently I was not 100% sure what I was doing during the quite substantial rebase.

Jun 02 '23 06:06 croth1

I'll leave @FarzanT and others to say how pressing their need is. Happy to help how I can @croth1 if you'd like to pursue this.

Jun 03 '23 15:06 joshmoore

Thank you @croth1 and @joshmoore, my primary use case has also been addressed by https://github.com/zarr-developers/zarr-python/issues/278#issuecomment-906527765, so it's not a pressing issue for me at the moment. But I'd say it would be much nicer to just flip a switch and have zarr handle this internally. If this pull request can address this, then it shouldn't be abandoned IMO!

Jun 06 '23 15:06 FarzanT

I'm going to close this as stale. Folks should feel free to reopen if there is interest in continuing this work.

Oct 11 '24 23:10 jhamman

zarr-python zarr-python copied to clipboard

[WIP] Attempt to continue LRU cache for decoded chunks

Codecov Report

zarr-python
zarr-python copied to clipboard