polars icon indicating copy to clipboard operation
polars copied to clipboard

OutOfSpec("A StructArray must contain at least one field") error when passing dataclass with empty dict field

Open rossmechanic opened this issue 2 years ago • 2 comments

Polars version checks

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of Polars.

Issue description

I use polars by passing in a list of python dataclasses into a DataFrame and then performing operations on that DataFrame. One of our dataclasses has field of type=dict, and the dict can sometimes be empty. If the dict field is empty, I get the following error:

pyo3_runtime.PanicException: called `Result::unwrap()` on an `Err` value: OutOfSpec("A StructArray must contain at least one field")

This is likely because it's trying to convert that field into a struct, and that struct cannot have no fields. I don't have this issue when passing dataclasses into pandas (assuming because they don't have a struct type).

Reproducible example

import polars as pl
from dataclasses import dataclass

@dataclass
class TestClass:
    int_field: int
    dict_field: dict

t = TestClass(int_field=10, dict_field={})

df = pl.DataFrame([t])

Expected behavior

I'd expect it make the dict into an object datatype or some other datatype. Or at the very least, error out with a more understandable error.

Installed versions

---Version info---
Polars: 0.15.10
Index type: UInt32
Platform: macOS-13.1-arm64-arm-64bit
Python: 3.10.4 (main, Aug  4 2022, 14:12:36) [Clang 13.1.6 (clang-1316.0.21.2.3)]
---Optional dependencies---
pyarrow: <not installed>
pandas: 1.5.2
numpy: 1.24.1
fsspec: <not installed>
connectorx: <not installed>
xlsx2csv: <not installed>
matplotlib: <not installed>

rossmechanic avatar Jan 02 '23 21:01 rossmechanic

I'd just add, that if we were able to define the struct ahead of time, that would solve this for us, but I don't see how I would do that.

rossmechanic avatar Jan 02 '23 22:01 rossmechanic

I also can't get passed this by defining the columns param. If the object has a dict field it errors out no matter what.

rossmechanic avatar Jan 02 '23 23:01 rossmechanic

@ritchie46 This bug not fixed yet. When have have two items, got error: Could not create a new DataFrame from Series. The Series have different lengths. Got [shape: (2,)

import polars as pl
from dataclasses import dataclass

@dataclass
class TestClass:
    int_field: int
    dict_field: dict

t = TestClass(int_field=10, dict_field={})

df = pl.DataFrame([t,t])

pebble2050 avatar Jan 06 '23 08:01 pebble2050