polars icon indicating copy to clipboard operation
polars copied to clipboard

SQL cross join incorrect filter plan

Open ion-elgreco opened this issue 2 years ago • 2 comments

Checks

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of Polars.

Reproducible example

df = pl.DataFrame({
    "id":['1','2','3']
})

df2 = pl.DataFrame({
    "id":['4','5','6']
})
ctx = pl.SQLContext(df=df, df2=df2)

ctx.execute(
    """
    SELECT *
    FROM df
        CROSS JOIN df2
    WHERE df.id = df2.id
    """,
    eager=False,
)

naive plan: (run LazyFrame.explain(optimized=True) to see the optimized plan)
SELECT [col("id"), col("id_right")] FROM
FILTER [(col("id")) == (col("id"))] FROM
CROSS JOIN:
LEFT PLAN ON: []
DF ["id"]; PROJECT */1 COLUMNS; SELECTION: "None"
RIGHT PLAN ON: []
DF ["id"]; PROJECT */1 COLUMNS; SELECTION: "None"
END CROSS JOIN

Log output

No response

Issue description

The filter plan is incorrect on the right-hand side of the join. In the select you can see the same columns got renamed to _right, but in the filter it doesn't properly recognize df2.id as df2.id_right. When I pass id_right it works.

Expected behavior

Pass the correct filter down in the plan.

Installed versions

--------Version info---------
Polars:              0.19.10
Index type:          UInt32
Platform:            Linux-5.15.90.1-microsoft-standard-WSL2-x86_64-with-glibc2.31
Python:              3.10.12 (main, Aug  9 2023, 14:47:34) [GCC 9.4.0]

----Optional dependencies----
adbc_driver_sqlite:  <not installed>
cloudpickle:         2.2.1
connectorx:          <not installed>
deltalake:           0.12.0
fsspec:              2023.9.2
gevent:              <not installed>
matplotlib:          <not installed>
numpy:               1.23.5
openpyxl:            <not installed>
pandas:              1.5.3
pyarrow:             11.0.0
pydantic:            1.10.13
pyiceberg:           <not installed>
pyxlsb:              <not installed>
sqlalchemy:          2.0.22
xlsx2csv:            0.8.1
xlsxwriter:          3.1.9

ion-elgreco avatar Oct 21 '23 19:10 ion-elgreco

Is the output incorrect? Can you reproduce with the polars dataframe API?

Could you show me an example with the incorrect output.

ritchie46 avatar Oct 23 '23 08:10 ritchie46

@ritchie46

SQL output (wrong):

shape: (9, 2)
id id_right
str str
"1" "4"
"1" "5"
"1" "6"
"2" "4"
"2" "5"
"2" "6"
"3" "4"
"3" "5"
"3" "6"

DataFrame API output (correct) df.join(df2, how="cross").filter(pl.col("id") == pl.col("id_right"))

shape: (0, 2)
idid_right
strstr

ion-elgreco avatar Oct 23 '23 10:10 ion-elgreco