earth2studio icon indicating copy to clipboard operation
earth2studio copied to clipboard

supporting collection of local files

Open mariusaurus opened this issue 1 year ago • 0 comments

Earth2Studio Pull Request

Description

This PR adds support for loading data arrays from local directory of monthly xr-readable files. Other features include:

  • defining dtype in data/utils.py function datasource_to_file
  • adding async_timeout to arguments of ARCO data source, to enable downloading large chunks of data. Previous value of 10min kept as default.

script to test new data source:

from earth2studio.data import DataArrayDirectory, fetch_data
from earth2studio.models.px import SFNO
from earth2studio.utils.time import to_time_array

source_path = '/lustre/fs4/portfolios/coreai/users/mkoch/hens_ics/era5_arco'
times = ["2020-01-01", "2020-03-01"]
model = SFNO


times = to_time_array(times)
source = DataArrayDirectory(source_path)

package = model.load_default_package()
model = model.load_model(package=package)
inco = model.input_coords()

ics, coords = fetch_data(source=source,
            time=times,
            variable=inco["variable"],
            lead_time=inco["lead_time"],
            device="cpu",
        )

print(f"{coords['time']=}")
print(f"{coords['variable']=}")
print(f'{ics.shape=}')

Checklist

  • [x] I am familiar with the Contributing Guidelines.
  • [ ] New or existing tests cover these changes.
  • [x] The documentation is up to date with these changes.
  • [x] The CHANGELOG.md is up to date with these changes.
  • [ ] An issue is linked to this pull request.

Dependencies

None

mariusaurus avatar Oct 01 '24 14:10 mariusaurus