xbatcher icon indicating copy to clipboard operation
xbatcher copied to clipboard

basic pluggable cache implementation

Open jhamman opened this issue 3 years ago • 1 comments

Description of proposed changes

This PR adds a new cache feature to Xbatcher's BatchGenerator. As implemented, this requires Zarr to serialize Xarray datasets. The cache itself is entirely pluggable, accepting any dict-like object to store caches in.

I'm putting this up to help foster discussions in #109. I'm still not sure its the best path forward but I'd like to get some feedback and this felt like a tangible way to test this idea out.

If you want to try this out, you could try this:

In [1]: import xarray as xr

In [2]: import xbatcher

In [3]: import zarr

In [4]: cache = zarr.storage.DirectoryStore('/flash/fast/storage/cache')

In [5]: ds = xr.tutorial.open_dataset('air_temperature')

In [6]: gen = xbatcher.BatchGenerator(ds, input_dims={'lat': 10, 'lon': 10}, cache=cache)

In [7]: %%time
   ...: for b in gen:
   ...:     pass
   ...: 
CPU times: user 95 ms, sys: 40.8 ms, total: 136 ms
Wall time: 146 ms

In [8]: %%time
   ...: for b in gen:
   ...:     pass
   ...: 
CPU times: user 59.6 ms, sys: 11.4 ms, total: 70.9 ms
Wall time: 65.5 ms

Note that I used a directory store here but this could be any zarr-friendly store (e.g. s3, redis, etc.)

jhamman avatar Oct 22 '22 00:10 jhamman

Codecov Report

Merging #115 (ea7f128) into main (77c470b) will decrease coverage by 5.71%. The diff coverage is 42.85%.

@@             Coverage Diff             @@
##              main     #115      +/-   ##
===========================================
- Coverage   100.00%   94.28%   -5.72%     
===========================================
  Files            5        5              
  Lines          192      210      +18     
  Branches        35       39       +4     
===========================================
+ Hits           192      198       +6     
- Misses           0        9       +9     
- Partials         0        3       +3     
Impacted Files Coverage Δ
xbatcher/generators.py 88.34% <42.85%> (-11.66%) :arrow_down:

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

codecov-commenter avatar Oct 22 '22 03:10 codecov-commenter

@jhamman, do you mind if I open a new PR based on your work here?

maxrjones avatar Jan 05 '23 21:01 maxrjones

Go for it!

jhamman avatar Jan 05 '23 21:01 jhamman