filprofiler
filprofiler copied to clipboard
Regression: pandas.read_parquet hangs when using filprofiler 2022.09.0
I hope the following is sufficient for reproducing the issue.
Writing with df.to_parquet
goes fine, it's when reading the data back with pd.read_parquet
that the code hangs. The parquet engine used is pyarrow. No error is raised, the docker container simply hangs forever.
python: 3.10.7 OS: Linux pandas: 1.4.4 numpy: 1.23.3 pyarrow: 9.0.0
Disabling filprofiler (I use the api with a conditional environment variable as documented in https://pythonspeed.com/fil/docs/api.html#using-the-python-api) resolves the issue. Also reverting to filprofiler 2022.06.0 (with everything else exactly the same) resolves the issue.
Thanks for the detailed bug report. I will try to reproduce, and if I fail I will ask for more details.
Hi, I am an unable to reproduce with a random parquet file I have lying around. Could you share a minimal reproducer if you can make one? Python script + parquet file, ideally.
@kdebrab just checking again, would love to get this fixed.
@kdebrab could you provide a reproducer?