polars
polars copied to clipboard
Error constructing DataFrame from dict scalar
Checks
- [X] I have checked that this issue has not already been reported.
- [X] I have confirmed this bug exists on the latest version of Polars.
Reproducible example
import polars as pl
print(pl.__version__)
pl.DataFrame({"test": 0, "nest": {"test": 0, "test2": 0}})
Log output
Traceback (most recent call last):
File "/Users/lheinz/work/jupyter/pl.py", line 4, in <module>
pl.DataFrame({"test": 0, "nest": {"test": 0, "test2": 0}})
File "/Users/lheinz/work/jupyter/.venv/lib/python3.12/site-packages/polars/dataframe/frame.py", line 373, in __init__
self._df = dict_to_pydf(
^^^^^^^^^^^^^
File "/Users/lheinz/work/jupyter/.venv/lib/python3.12/site-packages/polars/_utils/construction/dataframe.py", line 142, in dict_to_pydf
pydf = PyDataFrame(data_series)
^^^^^^^^^^^^^^^^^^^^^^^^
polars.exceptions.ShapeError: could not create a new DataFrame: series "test" has length 2 while series "nest" has length 1
Issue description
When reading nested data polars throws an error: ShapeError: could not create a new DataFrame: series "test" has length 2 while series "nest" has length 1
The error is also reporting wrong dimensions test has length 1
Expected behavior
This shoud produce a Dataframe with a struct column.
~~Was working in older versions~~
Installed versions
--------Version info---------
Polars: 0.20.16
Index type: UInt32
Platform: macOS-14.4-arm64-arm-64bit
Python: 3.12.1 (main, Jan 7 2024, 23:31:12) [Clang 16.0.3 ]
@stinodego: linked to the constructor refactor?
Probably. I'm not sure this ever used to work though. The problem seems to be in arrlen where it infers length 2 for the struct 'scalar'.
Probably. I'm not sure this ever used to work though. The problem seems to be in
arrlenwhere it infers length 2 for the struct 'scalar'.
@stinodego: Ahh, I think you're right - I assumed this was a regression, but installing some earlier versions of Polars to check further makes me think it is not ;)
@linusheinz: Did this ever work for you in a different version of Polars? (And, if so, which version?)
I just checked back to 20.9, and it did not work. It seems to be present in many versions.
I just checked back to 20.9, and it did not work. It seems to be present in many versions.
Note that the easiest way to address this is to provide sequence data in the dict, which is the expected form (as each value in the dict represents a column).
pl.DataFrame({"test": [0], "nest": [{"test": 0, "test2": 0}]})
# shape: (1, 2)
# ┌─────┬───────────┐
# │ x ┆ nest │
# │ --- ┆ --- │
# │ i64 ┆ struct[2] │
# ╞═════╪═══════════╡
# │ 0 ┆ {0,0} │
# └─────┴───────────┘
Passing non-sequence data here is a little hit and miss, as we typically expand it in a fast-path for quick generation of large frames with repetitive data, like so:
pl.DataFrame({"n": range(1000), "y": -1})
# shape: (1_000, 2)
# ┌─────┬─────┐
# │ n ┆ y │
# │ --- ┆ --- │
# │ i64 ┆ i32 │
# ╞═════╪═════╡
# │ 0 ┆ -1 │
# │ 1 ┆ -1 │
# │ … ┆ … │
# │ 998 ┆ -1 │
# │ 999 ┆ -1 │
# └─────┴─────┘