dask-sql icon indicating copy to clipboard operation
dask-sql copied to clipboard

[BUG] Extra fields returned for left semi and left anti joins

Open ChrisJar opened this issue 2 years ago • 0 comments

What happened: When performing a left semi or leftanti join, getFieldList and getFieldNames return an extra field that we need to filter out: https://github.com/dask-contrib/dask-sql/blob/5421bbf9b363ab15c657432e0c9d367c6f236df7/dask_sql/context.py#L849

Minimal Complete Verifiable Example: For example:

import pandas as pd
from dask_sql import Context

c = Context()

dfa = pd.DataFrame({"id":[1,2,2,4], "a":["a","b","c","d"]})
dfb = pd.DataFrame({"id":[2,3,3,4], "b":["e","f","g","h"]})
c.create_table("dfa", dfa, gpu=True)
c.create_table("dfb", dfb, gpu=True)

query = "Select * from dfa left anti join dfb on dfa.id = dfb.id"
res = c.sql(query).compute()
print(res)

Should only result in 2 field names returned by getFieldNames yet it returns 3

ChrisJar avatar Jun 30 '23 20:06 ChrisJar