pandas2
pandas2 copied to clipboard
Unified merge API
We have merge()
and merge_asof()
. There may even come a time when we perform functions on overlapping columns. As someone who wants to join two tables together, I just want a single mechanism to do so.
I wonder if it's possible to have a single API like:
merge(
left, # DataFrame or Table
right, # DataFrame or Table
on, # one or more columns
asof, # one or more columns
how, # 'left', 'right', 'inner', 'outer'
overlap, # optional function to apply to overlapping column names
)
Users must specify at least one of on
or asof
. There can also be left_on
/right_on
and left_asof
/right_asof
. We could even have left_index
/right_index
for the poor souls who still have indexed data (https://github.com/pydata/pandas-design/issues/17).
The overlap
is for when the same column name appears in both tables. Currently those columns are renamed with a suffix (though I'd be in favor of just raising an error). But there are a times when I want to perform a function. There are ways to do this with arithmetic operations (https://github.com/pydata/pandas-design/issues/30), though I think any function with two arguments would be nice, including overwritting the left with the right (for handling cases of missing data with a "fill" result).
Note that doesn't handle my proposed merge_window()
(https://github.com/pydata/pandas/issues/13959). The semantics there are very specific and I'm not sure how to put that in a unified structure as with above, though I'd love to hear any ideas.