quantms icon indicating copy to clipboard operation
quantms copied to clipboard

Add more QC metrics for pmultiQC and use mzQC files

Open jpfeuffer opened this issue 2 years ago • 1 comments

Description of feature

My plan is to use OpenMS' QCCalculator additionally in (almost) each step to create small mzQC files with additional summaries. Those mzQC files should contain only stuff that cannot be read from the final mzTab. This would also allow skipping the copying of the input mzMLs to the pmultiqc step since it just needs to read the already summarized data in the mzQC.

Please list places and metrics that we need to extract in the comments @ypriverol @timosachsenberg

MzMLs (run QCCalculator during mzML Indexing step?):

  • Export all metrics that our QC classes can do
  • Export number of spectra per file

idXMLs (per Search engines):

  • score distributions target vs decoy
    • Which scores to export?
    • Best hit only?
    • histogram or full density?
  • nr targets vs decoys
  • hits per psm?

idXMLs (after Perc/IDPEP):

  • target vs decoy distribution again

idXMLs (after consensusID):

  • overlap between search engines (e.g. 2D plot for every pair of search engines)
  • histogram of number of times a psm was identified with same, with different, ...
  • nr targets vs decoys
  • hits per psm?

idXMLs (after filtering):

  • do we need anything here?

idXMLs (after inference):

  • see #27
  • depends a bit on the order of FDR filtering if this can be inferred by comparing the mzTab with the raw IDs per file (but currently we do FDR filter before quantification, therefore it indeed might be helpful to know if a protein is missing because of filtering after inference or because of missing quant data
  • in any case, we need that information since we per-default also filter out decoys and a target-decoy score distribution plot would be helpful for proteins as well.
  • for TMT the inference idXML is easily accessible

features:

  • since we only generate features internally for ProteomicsLFQ, we must export summarized feature QC metrics during execution (or write out the temporary featureXMLs even without debug mode).
  • for TMT this does not really exist because the "consensus" features are not really 2D features

consensus features:

  • is there anything important that is not available in the mzTab?

jpfeuffer avatar May 09 '22 11:05 jpfeuffer

lets also keep https://github.com/axelwalter and https://github.com/cbielow in the loop

timosachsenberg avatar May 09 '22 12:05 timosachsenberg