vaex icon indicating copy to clipboard operation
vaex copied to clipboard

Unable to open gzipped CSV's [BUG-REPORT]

Open jaredbgo opened this issue 2 years ago • 2 comments

I am trying to use vaex.open() on a gzipped CSV.

Getting this error:

OSError: Cannot open ./folder/filename.csv.gz nobody knows how to read it.

Software information

  • Vaex version (import vaex; vaex.__version__): 4.9
  • Vaex was installed via: pip

Additional information Pandas is able to read in gzipped CSV's without issue. Do you plan on adding support at any point? Also a more descriptive error for this use case could be helpful, 'nobody knows how to read it' is a little vague although I understand you guys are likely using this as an umbrella error for many unsupported formats.

This same issue arises when using vaex.open_many(). Opening using vaex.from_csv and specifying compression='GZIP' was successful however.

jaredbgo avatar Jun 01 '22 17:06 jaredbgo

This is somewhat related to: https://github.com/vaexio/vaex/issues/1879

In a nutshell, for compressed files (csv, json) you need to use the right method and specify the compression type. Essentially there is no better way current to open the file from what you've found.

@maartenbreddels maybe we can consider adding the csv reader to the list of openers to try as the final fallback ?

JovanVeljanoski avatar Jun 01 '22 17:06 JovanVeljanoski

I think we need to explore the option mentioned in https://github.com/vaexio/vaex/issues/1879#issuecomment-1033691134 first

maartenbreddels avatar Aug 30 '22 07:08 maartenbreddels

I believe this is now possible in the new release, thanks to @maartenbreddels .

Please re-open if needed.

JovanVeljanoski avatar Sep 26 '22 14:09 JovanVeljanoski