polars
polars copied to clipboard
memory_map creates copies during certain operations since 0.19.4
Checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of Polars.
Reproducible example
import polars as pl
df = pl.read_ipc("data.feather", memory_map=True)
df[[1,2]]
The line df[[1,2]]
will start copying the entire DataFrame to in-process memory.
Log output
No response
Issue description
When reading an arrow-backed DataFrame (e.g., read_ipc
) from RAM disk using the option memory_map=True
, polars creates a zero-copy DataFrame. In version 0.19.3, indexing the DataFrame by multiple integer positions or performing a join on an already sorted DataFrame would also be zero-copy. Since version 0.19.4, these operations are no longer zero-copy and instead copy the DataFrame to in-process memory. With a large enough file this becomes obvious because the operations take a lot longer and, in some cases, the process runs out of memory.
Expected behavior
Similar to the behavior in version 0.19.3, I would expect that these operations remain zero-copy in newer versions.
Installed versions
--------Version info---------
Polars: 0.19.7
Index type: UInt32
Platform: macOS-13.5.1-arm64-arm-64bit
Python: 3.11.2 (v3.11.2:878ead1ac1, Feb 7 2023, 10:02:41) [Clang 13.0.0 (clang-1300.0.29.30)]
----Optional dependencies----
adbc_driver_sqlite: <not installed>
cloudpickle: 2.2.1
connectorx: <not installed>
deltalake: <not installed>
fsspec: 2023.3.0
gevent: <not installed>
matplotlib: <not installed>
numpy: 1.25.0
openpyxl: <not installed>
pandas: 2.1.1
pyarrow: 13.0.0
pydantic: 2.4.2
pyiceberg: <not installed>
pyxlsb: <not installed>
sqlalchemy: <not installed>
xlsx2csv: <not installed>
xlsxwriter: <not installed>