earth2studio icon indicating copy to clipboard operation
earth2studio copied to clipboard

🚀[FEA]: Example demonstrating how to create local dataset for inference

Open NickGeneva opened this issue 2 months ago • 1 comments

Is this a new feature, an improvement, or a change to existing functionality?

New Feature

How would you describe the priority of this feature request

Medium

Please provide a clear description of problem you would like to solve.

We need an example that demonstrates how to use the data sources to dump into a zarr store, then use that zarr store to run inference.

This type of workflow is particular relevant for more evaluation inference workflows.

NickGeneva avatar Oct 19 '25 21:10 NickGeneva

Hi @NickGeneva ,

+1 on this one. I am currently struggling to understand how can I use locally stored zarr / netcdf for initialisation of the forecasts. This example would be of high relevance for running inference on GPUs that do not have an explicit access to internet (e.g. supercomputers).

I'd be happy to work on that example in case you are looking for support.

Thanks !

--- More (optional) context and questions from my use case ---

  • How can I reuse the downloaded data (placed in .cache/eath2studio/<api_name>) without pinging the API again, e.g. without an internet connexion ? It seems that even the persistent cache management somehow needs to ping the APIs, even when the files are already downloaded in the cache.

I am struggling to understand how locally stored files can be used directly as inputs to the model. I feel that there is a solution with classes in earth2studio/data/xr.py, such as DatasetFile.

  • How can classes such as DatasetFile be used ? What are the requirements for the files (variables names, dims, ...) for a seamless implementation of DatasetFile?

forcadellvincent avatar Nov 25 '25 10:11 forcadellvincent