polars
polars copied to clipboard
scan_parquet from io.BytesIO()
Problem description
Add ability to accept io.BytesIO() as source parameter for scan_parquet. As for now, it accepts only a path to file/s.
This feature may be useful in cases when your program receives parquet through rest API or socket, directly into memory.
I am pretty sure that this is a duplicate. :thinking:
True, I'm sorry. I've found some related issues #4950 #9511. They all are about scan_csv but you definitely can close this as a duplicate. Just don't forget about parquet also :)
It would also be great if scan_* and read_* functions had unified input "type" for files\bytes\etc.. Also it will be nice so that they accepted list of BytesIO or path-like, to process them in parallel like with glob pattern.
My application has Parquet embedded as BLOBs in SQL tables, and processes and combines them lazily. I would love to see support for this - at the moment I have to use read_parquet() and miss out on pushdown optimisations.
A similar use case here. We have a bunch of Parquet files in memory I want to work with, without having all of them in memory at the same time.
I would be very happy with this improvement. I have about a million parquet files stored as binaries in Redis and I want to read them as LazyFrame to save memory space.