DiaNN icon indicating copy to clipboard operation
DiaNN copied to clipboard

Proteins.Identified in report.stats.tsv

Open zhangdong360 opened this issue 1 year ago • 1 comments

Hey, I found that the number of Proteins.Identified in the rep.stats.tsv file could not match the number of proteins in other matrix tables. The number of Proteins Identified in the report.stats.tsv file is less than that in pr,pg and unique_genes. Again, it is not clear to me how to explain the relationship between these two quantities. How do I filter from report.tsv or another file to get the Proteins.Identified in report.stats.tsv ? I noticed your description of both in the README, but I still don't quite understand how to get the number of proteins reported here in stats report. Kind regards, Dong

zhangdong360 avatar Dec 18 '23 07:12 zhangdong360

Hi Dong,

Different files are produced using different filtering. Please see the output description in the docs. If in doubt, please always just use the main report only. The number of proteins in stats report can be reproduced by:

  • Analysing with FDR set to 5%.
  • Filtering the main report using Protein.Q.Value <= 0.01 & Proteotypic == 1.
  • Calculating the number of unique proteins per run, where 'proteins' refers to what's determined by the 'Protein inference' setting.

Best, Vadim

vdemichev avatar Dec 18 '23 10:12 vdemichev