parcels icon indicating copy to clipboard operation
parcels copied to clipboard

Set smarter default for chunks[1] in zarr output file

Open erikvansebille opened this issue 6 months ago • 0 comments

The current default for the Parcels output chunks is (len(pset), 1), meaning that every observation in the output file will create a new chunk (with significant overhead delay); see also the note on output chunking in the documentation.

The reason we chose this is that it avoids extra NaNs in the 'observation' dimension in the output file. For example, if chunks[1]=5 and the outputfile has 7 observations, then the last 3 observations will be set to NaN. This may confuse users in their analysis.

However, we may be able to make a smarter choice for default of chunks[1] than 1. For example, a default based on the ratio between runtime and outputdt would make more sense, since for normal/simple simulations the number of expected observations is the floor(?) of runtime/outputdt.

Especially for long simulations with lots of outputs, the speedup could be massive.

erikvansebille avatar Jul 31 '24 11:07 erikvansebille