opendrift Reading in large amount of reader files: memory limit

I am working with SCHISM model files that contain a single time step each. At the moment I am reading in two months worth of files using:

data_path0 = '/<PATH>/schout_*.nc'
reader0 = reader_schism_native.Reader(data_path0,proj4='+proj=utm +zone=4 +ellps=WGS84 +datum=WGS84 +units=m +no_defs')

However that kills the run due to exceeding memory limit. Each timestep/model file is 270mb, so is creating the reader attempting to allocate 388gb of memory? Is there a better way to create the readers so it only accesses the timesteps one at a time?

Mar 06 '24 23:03 calquigs

The dataset is in this case opened with Xarray open_mfdataset: https://github.com/OpenDrift/opendrift/blob/master/opendrift/readers/reader_schism_native.py#L113 Maybe there is some memory leak there?

In the generic reader, there are some more options provided to open_mfdataset: https://github.com/OpenDrift/opendrift/blob/master/opendrift/readers/reader_netCDF_CF_generic.py#L100 Can you try if any of these options could solve the problem? I do not have any SCHISM files available for testing.

Mar 11 '24 17:03 knutfrode

I've tried adding those arguments and still getting the same issue. To confirm, is the intended behavior to read the files in as needed, or does the simulation need to be able to hold all the reader files in memory at once?

Mar 12 '24 11:03 calquigs

Update: reading in 2000 hourly timesteps using 'schout_*.nc' kills due to memory limit, but if I read the files in multiple readers of smaller chunks of between 100 and 1000 files (e.g. 'schout_??.nc','schout_???.nc','schout_1???.nc', the memory limit is not reached and I'm able to successfully complete a simulation! It takes 20+ minutes to read in, does that seem reasonable for this amount of data?

Mar 19 '24 17:03 calquigs

See this parallel issue: https://github.com/OpenDrift/opendrift/discussions/1241#discussioncomment-8869454

So you could also try to install h5netcdf with conda install h5netcdf and add engine="h5netcdf" to open_mfdataset in the SCHISM reader.

Mar 21 '24 18:03 knutfrode

opendrift opendrift copied to clipboard

Reading in large amount of reader files: memory limit

opendrift
opendrift copied to clipboard