Ritchie Vink
Ritchie Vink
I did a local test and can confirm the memory stay stable if we set the row group size to something like 1million. Is this the case for you/others as...
Strange that I cannot reproduce it. We only keep around the metadata from the row groups. If your row groups are too small it could be that the amount of...
Yes, it certainly is. I shall investigate when I can get my hands on a bigger machine. Seems something quadratic to me. Any insight from the profiler?
Ok, I have been able to reproduce this on the edges of my ram capacity. Am exploring now.
@jorgecarleitao could you help me a bit with this one? I ran this snippet: ```rust use polars::prelude::*; fn main() -> PolarsResult { let n = 5_00_000_000u32; let ca = Utf8Chunked::from_iter((0..n).map(|_|...
I did a run where I checked `debug_assertions` and `overflow-checks`, but that did not influence this.
@vikigenius I found and fixed the issue upstream: https://github.com/jorgecarleitao/arrow2/issues/1293. I will release a patch this weekend. Thanks for being patient and helping me remember this issue when it got stale....
This works: https://github.com/pola-rs/polars/tree/master/examples/python_rust_compiled_function
Thanks for the report. Are you certain we memory map if pyarrow=False? https://github.com/pola-rs/polars/blob/ac916d2dadc132cb78268ca8ab17210d1bec3777/py-polars/polars/io.py#L812 We use pyarrow for memory mapping and AFAICT we only take that branch if `use_pyarrow=True`
Oh sorry, I thought we were talking about IPC. For parquet we simply memory map the whole file and then copy it into memory. We do the same with the...