polars icon indicating copy to clipboard operation
polars copied to clipboard

Scanning and sinking a file without changing the file name causes a crash

Open lucazanna opened this issue 1 year ago • 4 comments

Polars version checks

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of Polars.

Issue description

Scanning and sinking a file with the same name crashes the system.

Reproducible example

import polars as pl

df = pl.DataFrame({
    'a': [1,2,3]
})

df.write_parquet('df.parquet')

# Works
pl.read_parquet('df.parquet').write_parquet('df.parquet')

# Works
pl.scan_parquet('df.parquet').collect(streaming=True).write_parquet('df.parquet')

# Crashes the system
pl.scan_parquet('df.parquet').sink_parquet('df.parquet')

# Works
pl.scan_parquet('df.parquet').sink_parquet('df2.parquet')

Expected behavior

I would expect an Error message specifying that the source and sink cannot have the same name.

Installed versions

---Version info---
Polars: 0.16.16
Index type: UInt32
Platform: Linux-5.10.147+-x86_64-with-glibc2.31
Python: 3.9.16 (main, Dec  7 2022, 01:11:51) 
[GCC 9.4.0]
---Optional dependencies---
numpy: 1.22.4
pandas: 1.4.4
pyarrow: 11.0.0
connectorx: <not installed>
deltalake: <not installed>
fsspec: 2023.3.0
matplotlib: 3.7.1
xlsx2csv: <not installed>
xlsxwriter: <not installed>

lucazanna avatar Mar 28 '23 17:03 lucazanna

Yes, the os will complain that we write to file we opened as mmap. We can add a check for this.

ritchie46 avatar Mar 30 '23 06:03 ritchie46

sounds good. I think an error message is all that's needed.

It took me some time to understand why streaming was not working. But it makes sense that you cannot stream from and to the same file.

lucazanna avatar Mar 30 '23 11:03 lucazanna

I shall link to this: https://www.urbandictionary.com/define.php?term=Don%27t%20shit%20where%20you%20eat

^^

ritchie46 avatar Mar 30 '23 12:03 ritchie46

that would make for a very memorable error message :)

lucazanna avatar Mar 30 '23 13:03 lucazanna