quantms icon indicating copy to clipboard operation
quantms copied to clipboard

mzTab to verbose with repeated data.

Open ypriverol opened this issue 2 years ago • 2 comments

Description of feature

@daichengxin @jpfeuffer :

I have found that the mzTab files (specially for big projects) are really huge, an mzTab with thousands of mzMLs can end up with a lot of columns with null values. In terms of storage, this is can have a huge impact, for example: PXD016999: mzTab (uncompress) 160GB -> mzTab (gzip) 2GB.

We should implement a last step (it can be optional --compress which compress big files) in the pipeline to gzip some of the files:

  • mzTab
  • mzML

What do you think @jpfeuffer @timosachsenberg @daichengxin

ypriverol avatar Jan 11 '23 16:01 ypriverol

I think it is a good idea. How long does compression take @ypriverol ?

Maybe it is better to do it in each step where the mztab is produced. This way you never have to transfer so much data (e.g. if you run on the cloud)

jpfeuffer avatar Jan 11 '23 21:01 jpfeuffer

I think it is a good idea. How long does compression take @ypriverol ?

Maybe it is better to do it in each step where the mztab is produced. This way you never have to transfer so much data (e.g. if you run on the cloud)

It takes some time (min). I think it would be nice in the steps (as you recommended) that output mzTab, that the file is compressed when exported.

ypriverol avatar Jan 12 '23 08:01 ypriverol