xarray icon indicating copy to clipboard operation
xarray copied to clipboard

User defined function to deal with Merge Conflicts

Open HCookie opened this issue 1 year ago • 3 comments

Is your feature request related to a problem?

When merging data with overlapping values at the same coordinates, the only current option to avoid a MergeError is to select the variable from the first dataset or to remove the offending areas.

This does not work when an element wise merge is desired, where the outcoming data is a result of an operation on the overlapping data.

An example is two heatmap datasets with an overlap, where it is desired for that overlap to be an average of the two seperate datasets.

Describe the solution you'd like

To allow expansion and customisability of this feature, I would see an ability to provide a user defined function that can receive the overlapping region element wise, and return a single value to be merged into the final variable.

import math
import xarray

def average_overlap(*values):
     return math.mean(values)

xarray.merge([data_1, data_2], merge_func = average_overlap) 
# Where data_1 & data_2 contain overlapping data

My concern for this feature is it's scalability for large overlapping regions and it's integration into the current merge code structure.

Describe alternatives you've considered

I've looked into implementing this myself within xarray, but cannot find the place to insert this feature.

As the data I am using is netcdf, xarray remains an excellent tool to use, and I would rather not use another method.

Additional context

No response

HCookie avatar Jul 29 '22 02:07 HCookie

Hi @HCookie, unless I'm missing something it may be possible to implement that fairly easily with concatenation then reduction? E.g., something like:

concatenated = xarray.concat([data_1, data_2], "new_dim", join="inner")
avg_overlap = concatenated.mean("new_dim")

benbovy avatar Jul 29 '22 11:07 benbovy

@benbovy, An interesting approach, with some fiddling it may work. But at least with a verbatim implementation only the overlap is maintained, however, my use case requires all data to be kept and merely the overlap properly dealt with. I'll look into it

HCookie avatar Aug 01 '22 00:08 HCookie

How about this?

concatenated = xarray.concat([data_1, data_2], "new_dim", join="outer")
avg_overlap = concatenated.mean("new_dim", skipna=True)

benbovy avatar Aug 01 '22 07:08 benbovy