vaex icon indicating copy to clipboard operation
vaex copied to clipboard

[BUG-REPORT] Problems reading a parquet file

Open ArtemkaDS opened this issue 1 year ago • 3 comments

Description got the error, trying to read parquet file:

Cannot open part-00077-c3446b0f-b1e8-469e-9f3d-4441e1651aa6.c000.snappy.parquet nobody knows how to read it.

Any thoughts how to fix it?

Software information

  • Vaex version (import vaex; vaex.__version__): last one
  • Vaex was installed via: pip / conda-forge / from source
  • OS: Win10

ArtemkaDS avatar Aug 19 '22 14:08 ArtemkaDS

Can you open it with pandas?

JovanVeljanoski avatar Aug 23 '22 08:08 JovanVeljanoski

Same issue, opens nicely with pandas, can't open with vaex

n0k0m3 avatar Aug 24 '22 13:08 n0k0m3

Can anyone provide an example of how that file was generated?

Edit: Or an example of how to generate a small such parquet file with some random data for testing - which vaex has troubles opening.

I tried this:

import vaex
import numpy as np

df = vaex.example().to_pandas_df()
df.to_parquet('part-00077-c3446b0f-b1e8-469e-9f3d-4441e1651aa6.c000.snappy.parquet', compression='snappy')
vaex.open('part-00077-c3446b0f-b1e8-469e-9f3d-4441e1651aa6.c000.snappy.parquet')

which works just fine.

JovanVeljanoski avatar Aug 25 '22 21:08 JovanVeljanoski