ipt icon indicating copy to clipboard operation
ipt copied to clipboard

Pandas gzip export not supported as source file

Open peterdesmet opened this issue 8 years ago • 5 comments

I've created a source file for a dataset by exporting a pandas DataFrame:

data.to_csv('../data/processed/vissen-natuurpunt-occurrences.tsv.zip', sep='\t', index=False, compression='gzip')

pandas.DataFrame.to_csv supports gzip, bz2 or xz, see http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_csv.html

However, if I upload this file as a source file in the IPT, I get the error:

Unsupported compression format. Please use zip, gzip, or plain text files.

Any idea why the pandas gzip is not supported?

The file I used is: vissen-natuurpunt-occurrences.tsv.zip

/cc @stijnvanhoey

peterdesmet avatar Oct 25 '16 18:10 peterdesmet

Could it be possible that the IPT does not like to work with filename.tsv.zip? Maybe try filename.zip

stijnvanhoey avatar Oct 25 '16 21:10 stijnvanhoey

Good suggestion, but I get the same error for:

  • Manually changing to extension to filename.zip
  • Exporting in pandas as filename.zip
  • Exporting in pandas as filename.txt.zip

The only method that works is unzipping the export locally and zipping it again (with Mac OS X default compress functionality). That file is recognized and unpacked by the IPT. So pandas gzip must be something different?

peterdesmet avatar Oct 26 '16 07:10 peterdesmet

Can you check the pure python zip package? https://docs.python.org/3.5/library/gzip.html

stijnvanhoey avatar Oct 26 '16 07:10 stijnvanhoey

Nope, that didn't help either...

peterdesmet avatar Oct 26 '16 14:10 peterdesmet

I've experienced similar, but out the other end - Windows machines unable to decompress DwC-A from the IPT. I suspect in this case, the older version of Java used in the IPT is the root cause.

dshorthouse avatar Oct 26 '16 15:10 dshorthouse