Uri Laserson comments

Results 37 comments of


                                            Uri Laserson

High memory usage for `scan_csv()->head()` on compressed CSV file

And same thing if I first execute ``` pl.Config.set_streaming_chunk_size(10) ```

High memory usage for `scan_csv()->head()` on compressed CSV file

And finally, maybe a related issue, is that a call to `pl.read_csv_batched` also uses an enormous amount of memory before I even request any batches of data.

High memory usage for `scan_csv()->head()` on compressed CSV file

Yes, I can confirm the same behavior if I first uncompress the file.

High memory usage for `scan_csv()->head()` on compressed CSV file

> We don't support streaming decompression yet. Is there a workaround for this? Perhaps by giving `scan_csv` a file-like object that handles the decompression? > I cannot reproduce this. Slice...

High memory usage for `scan_csv()->head()` on compressed CSV file

As an alternative, could you recommend a separate way to convert the large gzipped csv file into Parquet without needing a huge amount of memory?

High memory usage for `scan_csv()->head()` on compressed CSV file

Thanks @natemcintosh! Actually, I originally went with DuckDB first and when it failed on these jobs I picked up Polars. I was pretty surprised that DuckDB was dying even though...

High memory usage for `scan_csv()->head()` on compressed CSV file

Yes, similarly no matter what variation of the query/config I was trying to run, it always seemed to try to load the entire dataset into memory before anything, and then...