intake-esm
intake-esm copied to clipboard
Proper way to handle failing `preprocess` output.
I am encountering an issue with one dataset when loading many CMIP6 datasets using intake-esm (see #331).
I believe this is actually an issue with the raw data, but either way it got me curious if there is a way to handle the following scenario properly:
Lets say I have 2 dataset (ds_a
,ds_b
) in 2 different zarr stores and an appropriately set up intake-esm catalog.
Now I have some preprocessing function func
.
func
modifies something on each datasets, works fine on ds_a
, but fails on ds_b
.
Currently that will lead to a complete failure when reading in the full catalog with .to_datasets_dict()
.
Is there a way to simply exclude the failing dataset but continue to process only the ones that work? This would be very helpful to me.
EDIT: In further investigating this, it seems that in #331 the preprocessing is not even needed, but I guess this question can be phrased more generally: Is there a way to still output some datasets if errors are coming up for some of them?
@jbusecke,
Yes, I think this is doable. We could make this an optional setting that the user could opt-in. To be sure that this doesn't happen silently, we could raise a warning to let them know which keys failed.
Do you have suggestions on what the API would look like? I am imagining something along these lines:
col.to_dataset_dict(...., errors='ignore')
or
col.to_dataset_dict(...., skip_erroneous_datasets=True)
How about skip_errors=True
? Its a happy medium?