xbatcher
xbatcher copied to clipboard
basic pluggable cache implementation
Description of proposed changes
This PR adds a new cache feature to Xbatcher's BatchGenerator. As implemented, this requires Zarr to serialize Xarray datasets. The cache itself is entirely pluggable, accepting any dict-like object to store caches in.
I'm putting this up to help foster discussions in #109. I'm still not sure its the best path forward but I'd like to get some feedback and this felt like a tangible way to test this idea out.
If you want to try this out, you could try this:
In [1]: import xarray as xr
In [2]: import xbatcher
In [3]: import zarr
In [4]: cache = zarr.storage.DirectoryStore('/flash/fast/storage/cache')
In [5]: ds = xr.tutorial.open_dataset('air_temperature')
In [6]: gen = xbatcher.BatchGenerator(ds, input_dims={'lat': 10, 'lon': 10}, cache=cache)
In [7]: %%time
...: for b in gen:
...: pass
...:
CPU times: user 95 ms, sys: 40.8 ms, total: 136 ms
Wall time: 146 ms
In [8]: %%time
...: for b in gen:
...: pass
...:
CPU times: user 59.6 ms, sys: 11.4 ms, total: 70.9 ms
Wall time: 65.5 ms
Note that I used a directory store here but this could be any zarr-friendly store (e.g. s3, redis, etc.)
Codecov Report
Merging #115 (ea7f128) into main (77c470b) will decrease coverage by
5.71%. The diff coverage is42.85%.
@@ Coverage Diff @@
## main #115 +/- ##
===========================================
- Coverage 100.00% 94.28% -5.72%
===========================================
Files 5 5
Lines 192 210 +18
Branches 35 39 +4
===========================================
+ Hits 192 198 +6
- Misses 0 9 +9
- Partials 0 3 +3
| Impacted Files | Coverage Δ | |
|---|---|---|
| xbatcher/generators.py | 88.34% <42.85%> (-11.66%) |
:arrow_down: |
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.
@jhamman, do you mind if I open a new PR based on your work here?
Go for it!