polars icon indicating copy to clipboard operation
polars copied to clipboard

object dtype not supported in Series.iter

Open polikutinevgeny opened this issue 2 years ago • 5 comments

Checks

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import polars as pl


def func(d):
    return 1


df = pl.DataFrame({"col": pl.Series(["1", "2", "3"], dtype=pl.Object)})

# df.select(pl.col("col").apply(func))  # This works
df.select(pl.struct("col").apply(func))  # This doesn't

Issue description

thread '<unnamed>' panicked at 'object dtype not supported in Series.iter', /home/runner/work/polars/polars/polars/polars-core/src/series/iterator.rs:70:9
--- PyO3 is resuming a panic after fetching a PanicException from Python. ---
Python stack trace below:
Traceback (most recent call last):
  File "/opt/venv/lib/python3.9/site-packages/polars/expr/expr.py", line 3821, in wrap_f
    return x.apply(
  File "/opt/venv/lib/python3.9/site-packages/polars/series/series.py", line 4560, in apply
    self._s.apply_lambda(function, pl_return_dtype, skip_nulls)
pyo3_runtime.PanicException: object dtype not supported in Series.iter
Traceback (most recent call last):
  File "/opt/venv/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3505, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-11-e829c382dcd2>", line 1, in <module>
    df.select(pl.struct("col").apply(func))
  File "/opt/venv/lib/python3.9/site-packages/polars/dataframe/frame.py", line 7301, in select
    return self.lazy().select(*exprs, **named_exprs).collect(no_optimization=True)
  File "/opt/venv/lib/python3.9/site-packages/polars/lazyframe/frame.py", line 1530, in collect
    return wrap_df(ldf.collect())
pyo3_runtime.PanicException: Unwrapped panic from Python code

Expected behavior

Function is applied over struct without errors

Installed versions

--------Version info---------
Polars:              0.18.9
Index type:          UInt32
Platform:            Linux-6.4.7-arch1-1-x86_64-with-glibc2.31
Python:              3.9.16 (main, Mar 23 2023, 04:33:57) 
[GCC 10.2.1 20210110]
----Optional dependencies----
adbc_driver_sqlite:  <not installed>
cloudpickle:         2.2.1
connectorx:          <not installed>
deltalake:           <not installed>
fsspec:              2023.4.0
matplotlib:          3.7.1
numpy:               1.23.2
pandas:              1.5.3
pyarrow:             11.0.0
pydantic:            1.10.7
sqlalchemy:          1.4.47
xlsx2csv:            <not installed>
xlsxwriter:          <not installed>

polikutinevgeny avatar Jul 31 '23 05:07 polikutinevgeny

import polars as pl


def func(d):
    return 1


df = pl.DataFrame({"col": pl.Series(["1", "2", "3"], dtype=pl.Object)})

df.apply(func)

Also fails

polikutinevgeny avatar Jul 31 '23 05:07 polikutinevgeny

Yeap, we don't support that for object types yet. Try to avoid objects.

ritchie46 avatar Jul 31 '23 06:07 ritchie46

Until we can support object types here, we should throw a nice error somewhere.

stinodego avatar Jul 31 '23 18:07 stinodego

Please, pay attention to this. With this error, it's not possible to work with UUID columns.

joaoflaviosantos avatar Mar 24 '24 02:03 joaoflaviosantos

Please, pay attention to this. With this error, it's not possible to work with UUID columns.

You can store your UUID as a string column.

stinodego avatar Mar 24 '24 03:03 stinodego

Why does pl.col works fine with Object, but pl.struct does not? This also makes multiple column aggregation including Object column impossible.

A shorter reproduction:

import polars as pl

# runs correctly
s = pl.Series("a", [object()])
s.map_elements(lambda x: x, return_dtype=pl.Object)

# fails
s = pl.Series("a", [{"s": object()}])
s.map_elements(lambda x: x["s"], return_dtype=pl.Object)

metab0t avatar May 23 '24 04:05 metab0t

iter_rows also happily runs on rows with object columns.

vspinu avatar May 19 '25 15:05 vspinu