ipt
ipt copied to clipboard
Pandas gzip export not supported as source file
I've created a source file for a dataset by exporting a pandas DataFrame:
data.to_csv('../data/processed/vissen-natuurpunt-occurrences.tsv.zip', sep='\t', index=False, compression='gzip')
pandas.DataFrame.to_csv
supports gzip
, bz2
or xz
, see http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_csv.html
However, if I upload this file as a source file in the IPT, I get the error:
Unsupported compression format. Please use zip, gzip, or plain text files.
Any idea why the pandas gzip
is not supported?
The file I used is: vissen-natuurpunt-occurrences.tsv.zip
/cc @stijnvanhoey
Could it be possible that the IPT does not like to work with filename.tsv.zip
? Maybe try filename.zip
Good suggestion, but I get the same error for:
- Manually changing to extension to
filename.zip
- Exporting in pandas as
filename.zip
- Exporting in pandas as
filename.txt.zip
The only method that works is unzipping the export locally and zipping it again (with Mac OS X default compress functionality). That file is recognized and unpacked by the IPT. So pandas gzip must be something different?
Can you check the pure python zip package? https://docs.python.org/3.5/library/gzip.html
Nope, that didn't help either...
I've experienced similar, but out the other end - Windows machines unable to decompress DwC-A from the IPT. I suspect in this case, the older version of Java used in the IPT is the root cause.