mzTab to verbose with repeated data.
Description of feature
@daichengxin @jpfeuffer :
I have found that the mzTab files (specially for big projects) are really huge, an mzTab with thousands of mzMLs can end up with a lot of columns with null values. In terms of storage, this is can have a huge impact, for example: PXD016999: mzTab (uncompress) 160GB -> mzTab (gzip) 2GB.
We should implement a last step (it can be optional --compress which compress big files) in the pipeline to gzip some of the files:
- mzTab
- mzML
What do you think @jpfeuffer @timosachsenberg @daichengxin
I think it is a good idea. How long does compression take @ypriverol ?
Maybe it is better to do it in each step where the mztab is produced. This way you never have to transfer so much data (e.g. if you run on the cloud)
I think it is a good idea. How long does compression take @ypriverol ?
Maybe it is better to do it in each step where the mztab is produced. This way you never have to transfer so much data (e.g. if you run on the cloud)
It takes some time (min). I think it would be nice in the steps (as you recommended) that output mzTab, that the file is compressed when exported.