dask-sql
dask-sql copied to clipboard
[BUG] Extra fields returned for left semi and left anti joins
What happened:
When performing a left semi or leftanti join, getFieldList and getFieldNames return an extra field that we need to filter out:
https://github.com/dask-contrib/dask-sql/blob/5421bbf9b363ab15c657432e0c9d367c6f236df7/dask_sql/context.py#L849
Minimal Complete Verifiable Example: For example:
import pandas as pd
from dask_sql import Context
c = Context()
dfa = pd.DataFrame({"id":[1,2,2,4], "a":["a","b","c","d"]})
dfb = pd.DataFrame({"id":[2,3,3,4], "b":["e","f","g","h"]})
c.create_table("dfa", dfa, gpu=True)
c.create_table("dfb", dfb, gpu=True)
query = "Select * from dfa left anti join dfb on dfa.id = dfb.id"
res = c.sql(query).compute()
print(res)
Should only result in 2 field names returned by getFieldNames yet it returns 3