polars icon indicating copy to clipboard operation
polars copied to clipboard

memory_map creates copies during certain operations since 0.19.4

Open kevfly16 opened this issue 1 year ago • 5 comments

Checks

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import polars as pl
df = pl.read_ipc("data.feather", memory_map=True)
df[[1,2]]

The line df[[1,2]] will start copying the entire DataFrame to in-process memory.

Log output

No response

Issue description

When reading an arrow-backed DataFrame (e.g., read_ipc) from RAM disk using the option memory_map=True, polars creates a zero-copy DataFrame. In version 0.19.3, indexing the DataFrame by multiple integer positions or performing a join on an already sorted DataFrame would also be zero-copy. Since version 0.19.4, these operations are no longer zero-copy and instead copy the DataFrame to in-process memory. With a large enough file this becomes obvious because the operations take a lot longer and, in some cases, the process runs out of memory.

Expected behavior

Similar to the behavior in version 0.19.3, I would expect that these operations remain zero-copy in newer versions.

Installed versions

--------Version info---------
Polars:              0.19.7
Index type:          UInt32
Platform:            macOS-13.5.1-arm64-arm-64bit
Python:              3.11.2 (v3.11.2:878ead1ac1, Feb  7 2023, 10:02:41) [Clang 13.0.0 (clang-1300.0.29.30)]

----Optional dependencies----
adbc_driver_sqlite:  <not installed>
cloudpickle:         2.2.1
connectorx:          <not installed>
deltalake:           <not installed>
fsspec:              2023.3.0
gevent:              <not installed>
matplotlib:          <not installed>
numpy:               1.25.0
openpyxl:            <not installed>
pandas:              2.1.1
pyarrow:             13.0.0
pydantic:            2.4.2
pyiceberg:           <not installed>
pyxlsb:              <not installed>
sqlalchemy:          <not installed>
xlsx2csv:            <not installed>
xlsxwriter:          <not installed>

kevfly16 avatar Oct 05 '23 16:10 kevfly16