xarray
xarray copied to clipboard
User defined function to deal with Merge Conflicts
Is your feature request related to a problem?
When merging data with overlapping values at the same coordinates, the only current option to avoid a MergeError
is to select the variable from the first dataset or to remove the offending areas.
This does not work when an element wise merge is desired, where the outcoming data is a result of an operation on the overlapping data.
An example is two heatmap datasets with an overlap, where it is desired for that overlap to be an average of the two seperate datasets.
Describe the solution you'd like
To allow expansion and customisability of this feature, I would see an ability to provide a user defined function that can receive the overlapping region element wise, and return a single value to be merged into the final variable.
import math
import xarray
def average_overlap(*values):
return math.mean(values)
xarray.merge([data_1, data_2], merge_func = average_overlap)
# Where data_1 & data_2 contain overlapping data
My concern for this feature is it's scalability for large overlapping regions and it's integration into the current merge code structure.
Describe alternatives you've considered
I've looked into implementing this myself within xarray, but cannot find the place to insert this feature.
As the data I am using is netcdf, xarray remains an excellent tool to use, and I would rather not use another method.
Additional context
No response
Hi @HCookie, unless I'm missing something it may be possible to implement that fairly easily with concatenation then reduction? E.g., something like:
concatenated = xarray.concat([data_1, data_2], "new_dim", join="inner")
avg_overlap = concatenated.mean("new_dim")
@benbovy, An interesting approach, with some fiddling it may work. But at least with a verbatim implementation only the overlap is maintained, however, my use case requires all data to be kept and merely the overlap properly dealt with. I'll look into it
How about this?
concatenated = xarray.concat([data_1, data_2], "new_dim", join="outer")
avg_overlap = concatenated.mean("new_dim", skipna=True)