Uri Laserson
Uri Laserson
And same thing if I first execute ``` pl.Config.set_streaming_chunk_size(10) ```
And finally, maybe a related issue, is that a call to `pl.read_csv_batched` also uses an enormous amount of memory before I even request any batches of data.
Yes, I can confirm the same behavior if I first uncompress the file.
> We don't support streaming decompression yet. Is there a workaround for this? Perhaps by giving `scan_csv` a file-like object that handles the decompression? > I cannot reproduce this. Slice...
As an alternative, could you recommend a separate way to convert the large gzipped csv file into Parquet without needing a huge amount of memory?
Thanks @natemcintosh! Actually, I originally went with DuckDB first and when it failed on these jobs I picked up Polars. I was pretty surprised that DuckDB was dying even though...
Yes, similarly no matter what variation of the query/config I was trying to run, it always seemed to try to load the entire dataset into memory before anything, and then...