polars
polars copied to clipboard
`df.upsample` hangs when an "object" type column is present
Checks
- [X] I have checked that this issue has not already been reported.
- [X] I have confirmed this bug exists on the latest version of Polars.
Reproducible example
Given a dataframe with an "object" type column
from datetime import datetime
from pathlib import Path
import polars as pl
df = pl.DataFrame(
{
"filepath": [
Path("/this/path/file1"),
Path("/this/path/file2"),
Path("/this/path/file3"),
Path("/this/path/file4"),
],
"creation_date": [
datetime(2024, 1, 1),
datetime(2024, 1, 2),
datetime(2024, 1, 5),
datetime(2024, 1, 6),
],
"file_size": [1, 2, 3, 4],
"file_name": ["file1", "file2", "file3", "file4"],
}
)
┌──────────────────┬─────────────────────┬───────────┬───────────┐
│ filepath ┆ creation_date ┆ file_size ┆ file_name │
│ --- ┆ --- ┆ --- ┆ --- │
│ object ┆ datetime[μs] ┆ i64 ┆ str │
╞══════════════════╪═════════════════════╪═══════════╪═══════════╡
│ /this/path/file1 ┆ 2024-01-01 00:00:00 ┆ 1 ┆ file1 │
│ /this/path/file2 ┆ 2024-01-02 00:00:00 ┆ 2 ┆ file2 │
│ /this/path/file3 ┆ 2024-01-05 00:00:00 ┆ 3 ┆ file3 │
│ /this/path/file4 ┆ 2024-01-06 00:00:00 ┆ 4 ┆ file4 │
└──────────────────┴─────────────────────┴───────────┴───────────┘
"upsampling" appears to hang...
df.upsample("creation_date", every="1d") #<-- never finishes
Log output
none
Issue description
I have a dataframe with an "object" type column filled with pathlib.Path objects. When I try to "upsample" the dataframe by the "creation_date" column, it hangs.
However, if I remove the "object" column, upsample works fine.
df.select(pl.exclude(pl.Object)).upsample("creation_date", every="1d")
shape: (6, 3)
┌─────────────────────┬───────────┬───────────┐
│ creation_date ┆ file_size ┆ file_name │
│ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ i64 ┆ str │
╞═════════════════════╪═══════════╪═══════════╡
│ 2024-01-01 00:00:00 ┆ 1 ┆ file1 │
│ 2024-01-02 00:00:00 ┆ 2 ┆ file2 │
│ 2024-01-03 00:00:00 ┆ null ┆ null │
│ 2024-01-04 00:00:00 ┆ null ┆ null │
│ 2024-01-05 00:00:00 ┆ 3 ┆ file3 │
│ 2024-01-06 00:00:00 ┆ 4 ┆ file4 │
└─────────────────────┴───────────┴───────────┘
Expected behavior
I expected the upsample
method to work even in the presence of an "object" type column, and fill values in the "object" column with null
like it does for the other columns
Installed versions
--------Version info---------
Polars: 1.6.0
Index type: UInt32
Platform: Linux-5.14.21-150400.24.119-default-x86_64-with-glibc2.31
Python: 3.12.5 | packaged by conda-forge | (main, Aug 8 2024, 18:36:51) [GCC 12.4.0]
----Optional dependencies----
adbc_driver_manager <not installed>
altair <not installed>
cloudpickle 3.0.0
connectorx <not installed>
deltalake <not installed>
fastexcel <not installed>
fsspec 2024.6.1
gevent <not installed>
great_tables <not installed>
matplotlib 3.9.2
nest_asyncio 1.6.0
numpy 2.0.1
openpyxl <not installed>
pandas 2.2.2
pyarrow 17.0.0
pydantic <not installed>
pyiceberg <not installed>
sqlalchemy <not installed>
torch <not installed>
xlsx2csv <not installed>
xlsxwriter <not installed>