polars icon indicating copy to clipboard operation
polars copied to clipboard

Calculations finish locally but OOM when run in Docker

Open sr379-xyt opened this issue 1 year ago • 1 comments

Checks

  • [X] I have checked that this issue has not already been reported.
  • [X] I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import polars as pl
import pandas as pd

data = pl.datetime_range(
    start=pl.lit("2024-08-19T08:00:00", dtype=pl.Datetime(time_unit="ns")),
    end=pl.lit("2024-08-19T16:00:00", dtype=pl.Datetime(time_unit="ns")),
    interval="100us",
    eager=True,
)

df = pl.DataFrame(data, schema={"datetime": pl.Datetime(time_unit="ns")})


def create_calculation_plan(
    df: pl.DataFrame, end_dt: pd.Timestamp, offset_s: int
) -> pl.LazyFrame:
    plan = (
        df.lazy()
        .filter(
            pl.col("datetime")
            <= pl.lit(
                end_dt - pd.Timedelta(seconds=offset_s),
                dtype=pl.Datetime(time_unit="ns"),
            )
        )
        .unique()
        .last()
    )

    return plan


calculation_plans = [
    create_calculation_plan(df, pd.Timestamp("2024-08-19T16:00:00"), offset)
    for offset in range(3)
]

pl.collect_all(calculation_plans)

Log output

No response

Issue description

The code is finishing successfully when run locally on Windows with approx. 8GB memory available. It saturates memory (100% RAM utillisation) during the calculations. It finishes successfully even when run in two separate processes simultaneously. However when run locally using Docker the container is instantly OOM killed (memory limits not set, all available memory could be used). Also similar code was run succesfully on local Windows and docker container on kubernetes with more RAM available (12GB). The container on kubernetes was also OOM killed. I think that the problem could be connected with cgroups limits. However issue regarding those was completed https://github.com/pola-rs/polars/issues/15797

Expected behavior

The code finishes successfuly when executed in Docker container.

Installed versions

--------Version info---------
Polars:               1.5.0
Index type:           UInt32
Platform:             Windows-11-10.0.22631-SP0
Python:               3.12.4 | packaged by Anaconda, Inc. | (main, Jun 18 2024, 15:03:56) [MSC v.1929 64 bit (AMD64)]

----Optional dependencies----
adbc_driver_manager:  <not installed>
cloudpickle:          <not installed>
connectorx:           <not installed>
deltalake:            <not installed>
fastexcel:            <not installed>
fsspec:               <not installed>
gevent:               <not installed>
great_tables:         <not installed>
hvplot:               <not installed>
matplotlib:           <not installed>
nest_asyncio:         <not installed>
numpy:                2.1.0
openpyxl:             <not installed>
pandas:               2.2.2
pyarrow:              <not installed>
pydantic:             <not installed>
pyiceberg:            <not installed>
sqlalchemy:           <not installed>
torch:                <not installed>
xlsx2csv:             <not installed>
xlsxwriter:           <not installed>

sr379-xyt avatar Aug 27 '24 15:08 sr379-xyt

Maybe you are swapping locally and you cannot swap in docker?

In any case, going out of memory is not a bug we can fix. You need to get more memory for the job.

ritchie46 avatar Aug 28 '24 06:08 ritchie46

Thanks for response. Is there any way polars can throw exception while having not enough memory or OOM is the only possible outcome?

sr379-xyt avatar Aug 29 '24 09:08 sr379-xyt

No, Restoring from OOM is not really feasible.

ritchie46 avatar Aug 29 '24 10:08 ritchie46