polars icon indicating copy to clipboard operation
polars copied to clipboard

`df.upsample` hangs when an "object" type column is present

Open blaylockbk opened this issue 5 months ago • 1 comments

Checks

  • [X] I have checked that this issue has not already been reported.
  • [X] I have confirmed this bug exists on the latest version of Polars.

Reproducible example

Given a dataframe with an "object" type column

from datetime import datetime
from pathlib import Path
import polars as pl

df = pl.DataFrame(
    {
        "filepath": [
            Path("/this/path/file1"),
            Path("/this/path/file2"),
            Path("/this/path/file3"),
            Path("/this/path/file4"),
        ],
        "creation_date": [
            datetime(2024, 1, 1),
            datetime(2024, 1, 2),
            datetime(2024, 1, 5),
            datetime(2024, 1, 6),
        ],
        "file_size": [1, 2, 3, 4],
        "file_name": ["file1", "file2", "file3", "file4"],
    }
)
┌──────────────────┬─────────────────────┬───────────┬───────────┐
│ filepath         ┆ creation_date       ┆ file_size ┆ file_name │
│ ---              ┆ ---                 ┆ ---       ┆ ---       │
│ object           ┆ datetime[μs]        ┆ i64       ┆ str       │
╞══════════════════╪═════════════════════╪═══════════╪═══════════╡
│ /this/path/file1 ┆ 2024-01-01 00:00:00 ┆ 1         ┆ file1     │
│ /this/path/file2 ┆ 2024-01-02 00:00:00 ┆ 2         ┆ file2     │
│ /this/path/file3 ┆ 2024-01-05 00:00:00 ┆ 3         ┆ file3     │
│ /this/path/file4 ┆ 2024-01-06 00:00:00 ┆ 4         ┆ file4     │
└──────────────────┴─────────────────────┴───────────┴───────────┘

"upsampling" appears to hang...

df.upsample("creation_date", every="1d")  #<-- never finishes

Log output

none

Issue description

I have a dataframe with an "object" type column filled with pathlib.Path objects. When I try to "upsample" the dataframe by the "creation_date" column, it hangs.

However, if I remove the "object" column, upsample works fine.

df.select(pl.exclude(pl.Object)).upsample("creation_date", every="1d")
shape: (6, 3)
┌─────────────────────┬───────────┬───────────┐
│ creation_date       ┆ file_size ┆ file_name │
│ ---                 ┆ ---       ┆ ---       │
│ datetime[μs]        ┆ i64       ┆ str       │
╞═════════════════════╪═══════════╪═══════════╡
│ 2024-01-01 00:00:00 ┆ 1         ┆ file1     │
│ 2024-01-02 00:00:00 ┆ 2         ┆ file2     │
│ 2024-01-03 00:00:00 ┆ null      ┆ null      │
│ 2024-01-04 00:00:00 ┆ null      ┆ null      │
│ 2024-01-05 00:00:00 ┆ 3         ┆ file3     │
│ 2024-01-06 00:00:00 ┆ 4         ┆ file4     │
└─────────────────────┴───────────┴───────────┘

Expected behavior

I expected the upsample method to work even in the presence of an "object" type column, and fill values in the "object" column with null like it does for the other columns

Installed versions

--------Version info---------
Polars:              1.6.0
Index type:          UInt32
Platform:            Linux-5.14.21-150400.24.119-default-x86_64-with-glibc2.31
Python:              3.12.5 | packaged by conda-forge | (main, Aug  8 2024, 18:36:51) [GCC 12.4.0]

----Optional dependencies----
adbc_driver_manager  <not installed>
altair               <not installed>
cloudpickle          3.0.0
connectorx           <not installed>
deltalake            <not installed>
fastexcel            <not installed>
fsspec               2024.6.1
gevent               <not installed>
great_tables         <not installed>
matplotlib           3.9.2
nest_asyncio         1.6.0
numpy                2.0.1
openpyxl             <not installed>
pandas               2.2.2
pyarrow              17.0.0
pydantic             <not installed>
pyiceberg            <not installed>
sqlalchemy           <not installed>
torch                <not installed>
xlsx2csv             <not installed>
xlsxwriter           <not installed>

blaylockbk avatar Aug 28 '24 20:08 blaylockbk