koalas icon indicating copy to clipboard operation
koalas copied to clipboard

Join key disambiguation inconsistent with pandas

Open dbanda opened this issue 5 years ago • 1 comments

Join in koals breaks when on keyword is supplied to join on a shared column.

a = ks.DataFrame({"A" : [1,2,3], "B": [4,5,6]})
b = ks.DataFrame({"B" : [11,12,13], "C": [14,15,16]})
a.join(b, on="B", lsuffix="_left")

The above works in pandas, but in koalas throws a pyspark error: pyspark.sql.utils.AnalysisException: "Reference 'B' is ambiguous, could be: right_table.B, B.;"

dbanda avatar Jan 11 '21 09:01 dbanda

Thanks for raising this!

Currently DataFrame.join requires the (join) key column (i.e. B) to be the index of the right parameter, (i.e. b). Unfortunately, this doesn't match pandas behavior.

I'll look into this to see how we can improve this!

xinrong-meng avatar Jan 12 '21 22:01 xinrong-meng