polars
polars copied to clipboard
Calculations finish locally but OOM when run in Docker
Checks
- [X] I have checked that this issue has not already been reported.
- [X] I have confirmed this bug exists on the latest version of Polars.
Reproducible example
import polars as pl
import pandas as pd
data = pl.datetime_range(
start=pl.lit("2024-08-19T08:00:00", dtype=pl.Datetime(time_unit="ns")),
end=pl.lit("2024-08-19T16:00:00", dtype=pl.Datetime(time_unit="ns")),
interval="100us",
eager=True,
)
df = pl.DataFrame(data, schema={"datetime": pl.Datetime(time_unit="ns")})
def create_calculation_plan(
df: pl.DataFrame, end_dt: pd.Timestamp, offset_s: int
) -> pl.LazyFrame:
plan = (
df.lazy()
.filter(
pl.col("datetime")
<= pl.lit(
end_dt - pd.Timedelta(seconds=offset_s),
dtype=pl.Datetime(time_unit="ns"),
)
)
.unique()
.last()
)
return plan
calculation_plans = [
create_calculation_plan(df, pd.Timestamp("2024-08-19T16:00:00"), offset)
for offset in range(3)
]
pl.collect_all(calculation_plans)
Log output
No response
Issue description
The code is finishing successfully when run locally on Windows with approx. 8GB memory available. It saturates memory (100% RAM utillisation) during the calculations. It finishes successfully even when run in two separate processes simultaneously. However when run locally using Docker the container is instantly OOM killed (memory limits not set, all available memory could be used). Also similar code was run succesfully on local Windows and docker container on kubernetes with more RAM available (12GB). The container on kubernetes was also OOM killed. I think that the problem could be connected with cgroups limits. However issue regarding those was completed https://github.com/pola-rs/polars/issues/15797
Expected behavior
The code finishes successfuly when executed in Docker container.
Installed versions
--------Version info---------
Polars: 1.5.0
Index type: UInt32
Platform: Windows-11-10.0.22631-SP0
Python: 3.12.4 | packaged by Anaconda, Inc. | (main, Jun 18 2024, 15:03:56) [MSC v.1929 64 bit (AMD64)]
----Optional dependencies----
adbc_driver_manager: <not installed>
cloudpickle: <not installed>
connectorx: <not installed>
deltalake: <not installed>
fastexcel: <not installed>
fsspec: <not installed>
gevent: <not installed>
great_tables: <not installed>
hvplot: <not installed>
matplotlib: <not installed>
nest_asyncio: <not installed>
numpy: 2.1.0
openpyxl: <not installed>
pandas: 2.2.2
pyarrow: <not installed>
pydantic: <not installed>
pyiceberg: <not installed>
sqlalchemy: <not installed>
torch: <not installed>
xlsx2csv: <not installed>
xlsxwriter: <not installed>
Maybe you are swapping locally and you cannot swap in docker?
In any case, going out of memory is not a bug we can fix. You need to get more memory for the job.
Thanks for response. Is there any way polars can throw exception while having not enough memory or OOM is the only possible outcome?
No, Restoring from OOM is not really feasible.