polars icon indicating copy to clipboard operation
polars copied to clipboard

scan_parquet from io.BytesIO()

Open s-b90 opened this issue 2 years ago • 6 comments

Problem description

Add ability to accept io.BytesIO() as source parameter for scan_parquet. As for now, it accepts only a path to file/s. This feature may be useful in cases when your program receives parquet through rest API or socket, directly into memory.

s-b90 avatar Aug 10 '23 17:08 s-b90

I am pretty sure that this is a duplicate. :thinking:

ritchie46 avatar Aug 11 '23 07:08 ritchie46

True, I'm sorry. I've found some related issues #4950 #9511. They all are about scan_csv but you definitely can close this as a duplicate. Just don't forget about parquet also :)

s-b90 avatar Aug 11 '23 09:08 s-b90

It would also be great if scan_* and read_* functions had unified input "type" for files\bytes\etc.. Also it will be nice so that they accepted list of BytesIO or path-like, to process them in parallel like with glob pattern.

Object905 avatar Aug 11 '23 09:08 Object905

My application has Parquet embedded as BLOBs in SQL tables, and processes and combines them lazily. I would love to see support for this - at the moment I have to use read_parquet() and miss out on pushdown optimisations.

adamgreg avatar Sep 29 '23 16:09 adamgreg

A similar use case here. We have a bunch of Parquet files in memory I want to work with, without having all of them in memory at the same time.

aberres avatar Jan 10 '24 14:01 aberres

I would be very happy with this improvement. I have about a million parquet files stored as binaries in Redis and I want to read them as LazyFrame to save memory space.

shoz avatar Jan 24 '24 15:01 shoz