pangeo-forge-recipes icon indicating copy to clipboard operation
pangeo-forge-recipes copied to clipboard

Certain uses of `subset_inputs` will slow down Recipe validation.

Open alxmrs opened this issue 3 years ago • 1 comments

When we use break down a file into many subsets (e.g. {"time": 24, "level": 100}), it seems to take a long time to construct a recipe class. In particular, a lot of time is spent here: https://github.com/pangeo-forge/pangeo-forge-recipes/blob/e764542d05261a0b31916c61cc3afe2a96608b81/pangeo_forge_recipes/recipes/xarray_zarr.py#L835

It's probably because the number of iter_chunks() gets really large.

Fixing this would be useful for processing datasets where there are both a large number of files and each file itself is large (too big for memory).

alxmrs avatar Jul 12 '22 00:07 alxmrs

Good point Alex!

This option will go away with the beam refactor.

rabernat avatar Jul 12 '22 13:07 rabernat

Indeed subset_inputs no longer exists in 0.10.0, so closing. Thanks Alex!

cisaacstern avatar Aug 25 '23 18:08 cisaacstern