smile icon indicating copy to clipboard operation
smile copied to clipboard

[Feature proposal] Dataframe merge by ID

Open adamsar opened this issue 3 years ago • 3 comments

I've got a few different dataframes that I'd like to merge when doing calculating some regression, and right now I do so by converting to a matrix of doubles, aligning the rows by id, and then rebuilding a dataframe. In spark and pandas, they have utility methods that allow you to merge dataframes with a by option to specify which column is used to match the data.

Describe the solution you'd like Extend the merge method with either a simple by option to specific key to merge on, add a mergeWith method, or a MergeOptions parameter that contains information such as by (key to join on), and mergeType (inner vs outerjoins, left vs right join).

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.merge.html https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/Dataset.html

adamsar avatar Aug 05 '21 23:08 adamsar

Are you interested in join or a simple merge? You can merge two or more data frames suppose that rows are in the same order with existing API.

haifengl avatar Aug 10 '21 17:08 haifengl

More of a join. I've got a lot of dataframes, including some I receive from other departments, and it's sometimes painful to get these into a cohesive, single dataframe that contains the feature set I need.

As an edit: This functionality is exactly what I'd like https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.join.html

adamsar avatar Aug 10 '21 23:08 adamsar

We add smile.data.SQL for database management that supports join. The query/join result will be return as DataFrame. See SQLTest for examples.

haifengl avatar Apr 06 '24 20:04 haifengl