zarr-python icon indicating copy to clipboard operation
zarr-python copied to clipboard

Zarr LRU Cache(LRUStoreCache) not caching as expected

Open alexcpn opened this issue 1 year ago • 0 comments

Zarr version

2.14.2

Numcodecs version

0.11.0

Python Version

Python 3.10.6 (main, Mar 10 2023, 10:55:28) [GCC 11.3.0] on linux

Operating System

Linux

Installation

pip install zarr

Description

I have been using Zarr for storing elevation data files for an application. I am using the LRUStoreCache as below

 store = zarr.DirectoryStore(directoryStore) # Local store example
    cache = zarr.LRUStoreCache(store, max_size=2**32)
    dataset = zarr.open(store=cache, 
                        shape=[10**3, 10**3, 1, 3612, 3612],# first two keys to index into 
                        dtype='<f8',
                        chunks=[1, 1, 1, 3612, 3612],
                        fill_value=float('NaN'),
                        compressor=None,
                        mode="r",
                        synchronizer=zarr.ThreadSynchronizer())

However, on profiling with 'cprofile'/ or just noting the time, I find that retrieving the same key/array from the dataset was going to the file system (BuffereIORead) multiple times

image

However, if I use a simple Python dictionary like below, the retrieval happens only once and execution time decreases drastically for multiple invocations in a loop.

Obviously I do not want to code the overhead of maintaining this cache as per memory threshold. I would expect LRUStoreCache to do this work.

        if key in my_cache2:
            dataset_map[key]  = my_cache2[key]
            
        else:
            dataset_map[key] = zarrdataset[key]
            my_cache2[key] =dataset_map[key] 

I have used this as per the document. This is clearly not caching

Steps to reproduce

As given above

Additional output

None

alexcpn avatar Feb 07 '24 06:02 alexcpn