koalas
koalas copied to clipboard
Join key disambiguation inconsistent with pandas
Join in koals breaks when on keyword is supplied to join on a shared column.
a = ks.DataFrame({"A" : [1,2,3], "B": [4,5,6]})
b = ks.DataFrame({"B" : [11,12,13], "C": [14,15,16]})
a.join(b, on="B", lsuffix="_left")
The above works in pandas, but in koalas throws a pyspark error: pyspark.sql.utils.AnalysisException: "Reference 'B' is ambiguous, could be: right_table.B, B.;"
Thanks for raising this!
Currently DataFrame.join requires the (join) key column (i.e. B) to be the index of the right parameter, (i.e. b). Unfortunately, this doesn't match pandas behavior.
I'll look into this to see how we can improve this!