polars
polars copied to clipboard
a inner join sql return wrong results
Checks
- [X] I have checked that this issue has not already been reported.
- [X] I have confirmed this bug exists on the latest version of Polars.
Reproducible example
df = pl.DataFrame(
{
"A": [1, 2, 3, 4, 5],
"fruits": ["banana", "banana", "apple", "apple", "banana"],
"B": [5, 4, 3, 2, 1],
"cars": ["beetle", "audi", "beetle", "beetle", "beetle"],
}
)
context = pl.SQLContext()
context.register("t", df)
context.register("t1", df)
>>> lf = context.execute("select t.A,t.fruits,t1.B,t1.cars from t,t1 where t.A=t1.B")
>>> lf.collect()
shape: (1, 4)
┌─────┬────────┬─────┬────────┐
│ A ┆ fruits ┆ B ┆ cars │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ str │
╞═════╪════════╪═════╪════════╡
│ 3 ┆ apple ┆ 3 ┆ beetle │
└─────┴────────┴─────┴────────┘
>>> lf = context.execute("select t.A,t.fruits,t1.B,t1.cars from t join t1 on t.A=t1.B")
>>> lf.collect()
shape: (5, 4)
┌─────┬────────┬─────┬────────┐
│ A ┆ fruits ┆ B ┆ cars │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ str │
╞═════╪════════╪═════╪════════╡
│ 5 ┆ banana ┆ 1 ┆ beetle │
│ 4 ┆ apple ┆ 2 ┆ beetle │
│ 3 ┆ apple ┆ 3 ┆ beetle │
│ 2 ┆ banana ┆ 4 ┆ audi │
│ 1 ┆ banana ┆ 5 ┆ beetle │
└─────┴────────┴─────┴────────┘
Log output
No response
Issue description
1st sql should return 5 rows as following , but it only return 1 rows 2nd sql should return every lines has t.A=t1.B,but only 1 row fits that (t.A=t1.B=3)
Expected behavior
they both return 5 rows like this
+----------------------------
|A | fruits | B | cars
|--- | --- | --- | ---
|i64 | str | i64 | str
¦-----+--------+-----+-------
|1 | banana | 1 | beetle
|2 | banana | 2 | beetle
|3 | apple | 3 | beetle
|4 | apple | 4 | audi
|5 | banana | 5 | beetle
+----------------------------
Installed versions
--------Version info---------
Polars: 0.20.3
Index type: UInt32
Platform: Windows-7-6.1.7601-SP1
Python: 3.8.8 (tags/v3.8.8:024d805, Feb 19 2021, 13:18:16) [MSC v.1928 64 bit (AMD64)]
----Optional dependencies----
adbc_driver_manager: <not installed>
cloudpickle: 2.0.0
connectorx: <not installed>
deltalake: <not installed>
fsspec: 2021.11.1
gevent: 22.10.2
hvplot: <not installed>
matplotlib: 3.3.4
numpy: 1.23.4
openpyxl: 3.1.1
pandas: 1.3.2
pyarrow: 6.0.1
pydantic: 1.8.2
pyiceberg: <not installed>
pyxlsb: <not installed>
sqlalchemy: <not installed>
xlsx2csv: <not installed>
xlsxwriter: <not installed>