gollum
gollum copied to clipboard
Lower the barrier to entry for setting up model grids: downloading the full grid
Downloading, unzipping, and moving the model grids into the right location is time consuming, platform-dependent, and error-prone. We want a solution that provides access to the model grids quickly and reliably.
There are two ideas:
-
Provide a script to download and extract the native tar.gz files Pros: Does not introduce a new standard that we have to maintain, respects the original location. Cons: Difficult to design a platform-independent script capable of unzipping.
-
Store the raw grid in a binary format (e.g. HDF5, parquet, arrow, numpy arrays, pickle files, feather, etc.) Pros: Possibly very fast and efficient, only one file to download (not 6+ individual tar files), Cons: We have to maintain the files now (where do they get long-term stored, what README's go with it, might even be against the preferences of the original authors), the grids may be so big that storing them as a single massive array will exceed some machine memory limits.
In a discussion with @astrocaroline and @Jiayi-Cao today we agreed that option 2's benefits win the day. We'll want a format like HDF5 that allows granular access to sub-portions of the array without reading the entire dataset into memory.