filprofiler icon indicating copy to clipboard operation
filprofiler copied to clipboard

Regression: pandas.read_parquet hangs when using filprofiler 2022.09.0

Open kdebrab opened this issue 2 years ago • 4 comments

I hope the following is sufficient for reproducing the issue.

Writing with df.to_parquet goes fine, it's when reading the data back with pd.read_parquet that the code hangs. The parquet engine used is pyarrow. No error is raised, the docker container simply hangs forever.

python: 3.10.7 OS: Linux pandas: 1.4.4 numpy: 1.23.3 pyarrow: 9.0.0

Disabling filprofiler (I use the api with a conditional environment variable as documented in https://pythonspeed.com/fil/docs/api.html#using-the-python-api) resolves the issue. Also reverting to filprofiler 2022.06.0 (with everything else exactly the same) resolves the issue.

kdebrab avatar Sep 20 '22 16:09 kdebrab

Thanks for the detailed bug report. I will try to reproduce, and if I fail I will ask for more details.

itamarst avatar Sep 20 '22 16:09 itamarst

Hi, I am an unable to reproduce with a random parquet file I have lying around. Could you share a minimal reproducer if you can make one? Python script + parquet file, ideally.

itamarst avatar Sep 25 '22 22:09 itamarst

@kdebrab just checking again, would love to get this fixed.

itamarst avatar Sep 28 '22 16:09 itamarst

@kdebrab could you provide a reproducer?

itamarst avatar Oct 25 '22 13:10 itamarst