intake-esm
intake-esm copied to clipboard
Allow existing coordinates on "join_new"
Is your feature request related to a problem? Please describe.
Ok so this one is a bit more complex, and may be too specific to be implemented here. I'll let you decide. In my database, there are some files where "member_id" is a dimension and some others where it's not. I want to have "member_id" in the catalog for searches and I want to be able to aggregate the output so that member_id is a dimension in the datasets of to_dataset_dict.
For this I use a "join_new" aggregation, which cover the the second case. However, when "member_id" already exist as a dimension in the file, this fails. For these cases, I should have used a "join_existing" aggregation.
Describe the solution you'd like I'd like that when using "join_new" and the coordinate already exists on the dataset, no error is raised.
Describe alternatives you've considered
Two catalogs? Split all my files on disk? Remove member_id from the aggregation control?
Additional context
I tried something on a branch. It works by modifying source._expand_dims. If the dimension exists and has the correct coordinate, no expansion is done. See my changes here:
https://github.com/Ouranosinc/intake-esm/blob/4aa37fda43139b21b4feecf7457dd85f36c12856/intake_esm/source.py#L96-L117
and also further down, lines 232-239, so that iterable fields are passed correctly.
Is this an acceptable feature?