joinery icon indicating copy to clipboard operation
joinery copied to clipboard

How could I join two data frames by specfying columns from each data frame ?

Open DareUrDream opened this issue 6 years ago • 2 comments

Hi,

I have a situation where there are two data frames with no common columns. How can I join them ? I want to join them with every other column one after another to produce various outputs.

Is it possible to join two DF's by specifying the mapping column/s from each DF ?

Cheers, DareUrDream

DareUrDream avatar Jan 24 '19 14:01 DareUrDream

Try using the joinOn method with the column names for which the values match. Alternatively, you can use the joinOn method with a function that computes the join key.

cardillo avatar Jan 24 '19 17:01 cardillo

Hi @cardillo ,

I have achieved it for the time being by renaming columns in one of the data sets. But then I have hit another bottle neck. Below is the stack trace. I am not sure how to prepare a unique key now so that the join works.

resource.txt agentstatedetail1m_copy.txt

Stack trace

Exception in thread "main" java.lang.IllegalArgumentException: generated key is not unique: [3] at joinery.impl.Combining.join(Combining.java:45) at joinery.impl.Combining.joinOn(Combining.java:102) at joinery.DataFrame.joinOn(DataFrame.java:730) at joinery.DataFrame.joinOn(DataFrame.java:756) at com.cisco.evaluate.joinery.JoineryTestMain.startEvaluation(JoineryTestMain.java:37) at com.cisco.evaluate.joinery.JoineryTestMain.main(JoineryTestMain.java:18)

Code Below

`DataFrame<Object> rsrcDf = DataFrame.readCsv(ClassLoader.getSystemResourceAsStream("resource.csv")) .retain("resourceid", "resourceloginid", "resourcename", "resourcegroupid", "extension", "resourceskillmapid", "assignedteamid", "resourcefirstname", "resourcelastname");

	DataFrame<Object> asdDf = DataFrame.readCsv(ClassLoader.getSystemResourceAsStream("agentstatedetail1m_copy.csv")).retain("agentid", "eventtype");
	
	asdDf = asdDf.rename("agentid", "resourceid");
	
	DataFrame<Object> joinedDf = asdDf.joinOn(rsrcDf, JoinType.LEFT, "resourceid");
	System.out.println("Final row count: " + joinedDf);`

Cheers, DareUrDream

DareUrDream avatar Jan 25 '19 02:01 DareUrDream