polars
polars copied to clipboard
Scanning and sinking a file without changing the file name causes a crash
Polars version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of Polars.
Issue description
Scanning and sinking a file with the same name crashes the system.
Reproducible example
import polars as pl
df = pl.DataFrame({
'a': [1,2,3]
})
df.write_parquet('df.parquet')
# Works
pl.read_parquet('df.parquet').write_parquet('df.parquet')
# Works
pl.scan_parquet('df.parquet').collect(streaming=True).write_parquet('df.parquet')
# Crashes the system
pl.scan_parquet('df.parquet').sink_parquet('df.parquet')
# Works
pl.scan_parquet('df.parquet').sink_parquet('df2.parquet')
Expected behavior
I would expect an Error message specifying that the source and sink cannot have the same name.
Installed versions
---Version info---
Polars: 0.16.16
Index type: UInt32
Platform: Linux-5.10.147+-x86_64-with-glibc2.31
Python: 3.9.16 (main, Dec 7 2022, 01:11:51)
[GCC 9.4.0]
---Optional dependencies---
numpy: 1.22.4
pandas: 1.4.4
pyarrow: 11.0.0
connectorx: <not installed>
deltalake: <not installed>
fsspec: 2023.3.0
matplotlib: 3.7.1
xlsx2csv: <not installed>
xlsxwriter: <not installed>
Yes, the os will complain that we write to file we opened as mmap
. We can add a check for this.
sounds good. I think an error message is all that's needed.
It took me some time to understand why streaming was not working. But it makes sense that you cannot stream from and to the same file.
I shall link to this: https://www.urbandictionary.com/define.php?term=Don%27t%20shit%20where%20you%20eat
^^
that would make for a very memorable error message :)