renjin icon indicating copy to clipboard operation
renjin copied to clipboard

Speed issue on merging data frames

Open zahidaliyounis opened this issue 5 years ago • 1 comments

When trying to merge 2 data frames(121K, 31K rows) it is taking a few minutes to complete. Takes a few seconds in R Studio.

SalesOrderDetail 121K records SalesOrderHeader 31K records

SalesOrderDetail <- read.csv('PATH, sep='\t', header = TRUE) SalesOrderHeader <- read.csv('PATH', sep='\t', header = TRUE) merged <- merge(x = SalesOrderHeader, y = SalesOrderDetail, by.x = 'SalesOrderID', by.y = 'SalesOrderID')

Data is from the Microsoft Adventure Works database, CSV files can be download below

AdventureWorksSales.zip

zahidaliyounis avatar Feb 14 '20 11:02 zahidaliyounis

Renjin currently uses a version of merge() written in pure R to replace the (internal) C implementation from GNU R. See #10. This is probably why performance is not optimal.

mjkallen avatar Feb 18 '20 08:02 mjkallen