pangeo-forge-recipes
pangeo-forge-recipes copied to clipboard
Certain uses of `subset_inputs` will slow down Recipe validation.
When we use break down a file into many subsets (e.g. {"time": 24, "level": 100}), it seems to take a long time to construct a recipe class. In particular, a lot of time is spent here:
https://github.com/pangeo-forge/pangeo-forge-recipes/blob/e764542d05261a0b31916c61cc3afe2a96608b81/pangeo_forge_recipes/recipes/xarray_zarr.py#L835
It's probably because the number of iter_chunks() gets really large.
Fixing this would be useful for processing datasets where there are both a large number of files and each file itself is large (too big for memory).
Good point Alex!
This option will go away with the beam refactor.
Indeed subset_inputs no longer exists in 0.10.0, so closing. Thanks Alex!