polars
polars copied to clipboard
Scan zipped files
Problem description
I wish I could use polars to scan zipped csv (and more ?) files.
This exemple works with read_csv
but fails with scan_csv
import os
import shutil
df = pd.DataFrame({'col': [126.3263, 45.23874]})
# create zip
os.mkdir('tmp')
df.to_csv('./tmp/tmp.csv')
shutil.make_archive('myzip', 'zip', 'tmp')
# try to read zipped_file
with zipfile.ZipFile('myzip.zip') as zipFile:
df = pl.scan_csv(zipFile.read('tmp.csv'))
Scan needs to recieve a path, whereas zipfile
requires supplying Polars with a file handle to the internal file location, because your zip could contain more than one file. Even on files you can get an unambiguous path towards, though, like czv.gz and csv.xz, scan_csv
will actually refuse to read those and ask you to use read_csv
instead (see https://github.com/pola-rs/polars/issues/7287).
read_csv
can read singlular compressed files just fine. But when globbing, scan_csv
gets called, causing it to give up.
Not sure why this doesn't work in the current implementation.