physicsnemo icon indicating copy to clipboard operation
physicsnemo copied to clipboard

🐛[BUG]: Climate data sources do not set stride for first index

Open sahnimanas opened this issue 1 year ago • 3 comments

Version

0.7

On which installation method(s) does this occur?

No response

Describe the issue

The climate data sources (ERA5DaliExternalSource and ClimateDataSourceSpec) both specify a stride parameter to read data at strides larger than what's stored in the dataset files. But the stride is only applied when reading multiple time steps (num_steps>1) The first index that is read remains independent of the stride. Example: https://github.com/NVIDIA/modulus/blob/main/modulus/datapipes/climate/era5_hdf5.py#L598

My understanding of the stride parameter is that it should allow reading subsamples of the data source even for the first index. If that's not the expected usage, this bug could instead be viewed as a feature enhancement, for what a quick addition to allow coarser-granularity subsets of the same dataset on disk (e.g. read every 6 hours from a 1-hour dataset on disk)

Minimum reproducible example

No response

Relevant log output

No response

Environment details

No response

sahnimanas avatar Aug 23 '24 05:08 sahnimanas

@loliverhennigh could you please take a look at this issue?

mnabian avatar Oct 17 '24 01:10 mnabian

I've looked at this a bit more and I think if we agree with the interpretation of stride here, then it should be sufficient to fix it by loading data[self.stride * in_idx] instead of data[in_idx]. But it may also be reasonable if one interprets the currrent definition to be consistent with the implementation.

Overall, such behavior may be left better as something for the user to extend on their own. Both the above implementations may be relevant to different users, and it's hard for them to confirm the behavior without looking at the code. If existing behavior is not what they want, then it's hard to extend the datapipe due to lack of modularity. An alternate design proposed has been proposed internally & may be more appropriate.

sahnimanas avatar Oct 17 '24 21:10 sahnimanas

Hi @sahnimanas @loliverhennigh any updates on this?

ktangsali avatar Jan 28 '25 18:01 ktangsali