polars-cli
polars-cli copied to clipboard
Inconsistent COUNT(*) and GROUP BY behavior in Polars CLI
Checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of the Polars CLI.
Reproducible example
# generate test.csv
cat<<EOF > test.csv
a
test
test
test2
test3
EOF
# run group by query
echo "SELECT COUNT(*) AS _count, a FROM read_csv('test.csv') GROUP BY a;" | polars
Output
┌────────┬───────┐
│ _count ┆ a │
│ --- ┆ --- │
│ u32 ┆ str │
╞════════╪═══════╡
│ 3 ┆ test2 │
│ 3 ┆ test3 │
│ 3 ┆ test │
└────────┴───────┘
Issue description
COUNT(*)
is seemingly counting all rows, instead of using the group by.
Expected behavior
import polars as pl
df = pl.read_csv('test.csv')
with pl.SQLContext(register_globals=True, eager=True) as ctx:
df_small = ctx.execute("SELECT COUNT(*) AS _count, a FROM df GROUP BY a")
print(df_small)
python3 polarstest.py
shape: (3, 2)
┌────────┬───────┐
│ _count ┆ a │
│ --- ┆ --- │
│ u32 ┆ str │
╞════════╪═══════╡
│ 2 ┆ test │
│ 1 ┆ test3 │
│ 1 ┆ test2 │
└────────┴───────┘
Installed version
0.8.0