intake-esm icon indicating copy to clipboard operation
intake-esm copied to clipboard

Allow existing coordinates on "join_new"

Open aulemahal opened this issue 3 years ago • 0 comments
trafficstars

Is your feature request related to a problem? Please describe. Ok so this one is a bit more complex, and may be too specific to be implemented here. I'll let you decide. In my database, there are some files where "member_id" is a dimension and some others where it's not. I want to have "member_id" in the catalog for searches and I want to be able to aggregate the output so that member_id is a dimension in the datasets of to_dataset_dict.

For this I use a "join_new" aggregation, which cover the the second case. However, when "member_id" already exist as a dimension in the file, this fails. For these cases, I should have used a "join_existing" aggregation.

Describe the solution you'd like I'd like that when using "join_new" and the coordinate already exists on the dataset, no error is raised.

Describe alternatives you've considered Two catalogs? Split all my files on disk? Remove member_id from the aggregation control?

Additional context I tried something on a branch. It works by modifying source._expand_dims. If the dimension exists and has the correct coordinate, no expansion is done. See my changes here:

https://github.com/Ouranosinc/intake-esm/blob/4aa37fda43139b21b4feecf7457dd85f36c12856/intake_esm/source.py#L96-L117

and also further down, lines 232-239, so that iterable fields are passed correctly.

Is this an acceptable feature?

aulemahal avatar Dec 15 '21 22:12 aulemahal