zarr-python
zarr-python copied to clipboard
Zarr LRU Cache(LRUStoreCache) not caching as expected
Zarr version
2.14.2
Numcodecs version
0.11.0
Python Version
Python 3.10.6 (main, Mar 10 2023, 10:55:28) [GCC 11.3.0] on linux
Operating System
Linux
Installation
pip install zarr
Description
I have been using Zarr for storing elevation data files for an application. I am using the LRUStoreCache as below
store = zarr.DirectoryStore(directoryStore) # Local store example
cache = zarr.LRUStoreCache(store, max_size=2**32)
dataset = zarr.open(store=cache,
shape=[10**3, 10**3, 1, 3612, 3612],# first two keys to index into
dtype='<f8',
chunks=[1, 1, 1, 3612, 3612],
fill_value=float('NaN'),
compressor=None,
mode="r",
synchronizer=zarr.ThreadSynchronizer())
However, on profiling with 'cprofile'/ or just noting the time, I find that retrieving the same key/array from the dataset was going to the file system (BuffereIORead) multiple times
However, if I use a simple Python dictionary like below, the retrieval happens only once and execution time decreases drastically for multiple invocations in a loop.
Obviously I do not want to code the overhead of maintaining this cache as per memory threshold. I would expect LRUStoreCache to do this work.
if key in my_cache2:
dataset_map[key] = my_cache2[key]
else:
dataset_map[key] = zarrdataset[key]
my_cache2[key] =dataset_map[key]
I have used this as per the document. This is clearly not caching
Steps to reproduce
As given above
Additional output
None