quantms icon indicating copy to clipboard operation
quantms copied to clipboard

quantms.io protein groups representation.

Open ypriverol opened this issue 6 months ago • 4 comments

Description of feature

We have the protein groups in quantms.io representation here the section https://io.quantms.org/#protein. For all our datasets, Im producing quantms.io using the library. We can now produce for the DIA workflow the following:

  • feature.parquet
  • psm.parquet
  • pg.parquet

However, for the TMT and LFQ workflow, we don't have support; this is mainly because we rely on MSStats. I want to gradually produce these files and convert from the mzTab protein tables to quantms.io. Here are two of the mzTab I recently produced for a TMT and an LFQ experiment:

  • TMT: https://ftp.pride.ebi.ac.uk/pub/databases/pride/resources/proteomes/absolute-expression-2025/tissues/PXD016999-second-instrument/quant_tables/ (A quick note here, apart of the mzTab we do have here other files like protein and peptide tables from proteinquantifier)
  • LFQ: https://ftp.pride.ebi.ac.uk/pub/databases/pride/resources/proteomes/absolute-expression-2025/tissues/PXD020192/quant_tables/

Can you @timosachsenberg @jpfeuffer help me to understand this information from the proteins to make sure that we are exporting the right information to quantms.io.

Please, if you need smaller files, let me know I can try to produce the mzTab examples. These ones are also updated:

TMT: https://ftp.pride.ebi.ac.uk/pub/databases/pride/resources/proteomes/pmultiqc/example-projects/TMT_PXD007683.zip

LFQ: https://ftp.pride.ebi.ac.uk/pub/databases/pride/resources/proteomes/pmultiqc/example-projects/LFQ_PXD007683.zip

ypriverol avatar Jun 04 '25 12:06 ypriverol

Do you have specific questions? What is unclear?

jpfeuffer avatar Jun 04 '25 16:06 jpfeuffer

What quant should I write in the quantms.io, from the three proteins row we wrote including desripction, ambiquity etc how we translate that into quantms.io

ypriverol avatar Jun 04 '25 16:06 ypriverol

Indistinguishable groups plus single proteins will be your protein groups.

The protein table can be constructed from a join of the protein_details rows and the exploded indistinguishable_groups concatenated single_proteins. EDIT: I see you don't have a protein table. Then you might need to explode, join and gather to have everything in the pg table.

jpfeuffer avatar Jun 04 '25 16:06 jpfeuffer

Feature table = peptide table PSM table = PSM table

jpfeuffer avatar Jun 05 '25 09:06 jpfeuffer