polars
polars copied to clipboard
OutOfSpec("A StructArray must contain at least one field") error when passing dataclass with empty dict field
Polars version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of Polars.
Issue description
I use polars by passing in a list of python dataclasses into a DataFrame and then performing operations on that DataFrame. One of our dataclasses has field of type=dict, and the dict can sometimes be empty. If the dict field is empty, I get the following error:
pyo3_runtime.PanicException: called `Result::unwrap()` on an `Err` value: OutOfSpec("A StructArray must contain at least one field")
This is likely because it's trying to convert that field into a struct, and that struct cannot have no fields. I don't have this issue when passing dataclasses into pandas (assuming because they don't have a struct type).
Reproducible example
import polars as pl
from dataclasses import dataclass
@dataclass
class TestClass:
int_field: int
dict_field: dict
t = TestClass(int_field=10, dict_field={})
df = pl.DataFrame([t])
Expected behavior
I'd expect it make the dict into an object datatype or some other datatype. Or at the very least, error out with a more understandable error.
Installed versions
---Version info---
Polars: 0.15.10
Index type: UInt32
Platform: macOS-13.1-arm64-arm-64bit
Python: 3.10.4 (main, Aug 4 2022, 14:12:36) [Clang 13.1.6 (clang-1316.0.21.2.3)]
---Optional dependencies---
pyarrow: <not installed>
pandas: 1.5.2
numpy: 1.24.1
fsspec: <not installed>
connectorx: <not installed>
xlsx2csv: <not installed>
matplotlib: <not installed>
I'd just add, that if we were able to define the struct ahead of time, that would solve this for us, but I don't see how I would do that.
I also can't get passed this by defining the columns param. If the object has a dict field it errors out no matter what.
@ritchie46 This bug not fixed yet. When have have two items, got error: Could not create a new DataFrame from Series. The Series have different lengths. Got [shape: (2,)
import polars as pl
from dataclasses import dataclass
@dataclass
class TestClass:
int_field: int
dict_field: dict
t = TestClass(int_field=10, dict_field={})
df = pl.DataFrame([t,t])