quantms
quantms copied to clipboard
Add more QC metrics for pmultiQC and use mzQC files
Description of feature
My plan is to use OpenMS' QCCalculator additionally in (almost) each step to create small mzQC files with additional summaries. Those mzQC files should contain only stuff that cannot be read from the final mzTab. This would also allow skipping the copying of the input mzMLs to the pmultiqc step since it just needs to read the already summarized data in the mzQC.
Please list places and metrics that we need to extract in the comments @ypriverol @timosachsenberg
MzMLs (run QCCalculator during mzML Indexing step?):
- Export all metrics that our QC classes can do
- Export number of spectra per file
idXMLs (per Search engines):
- score distributions target vs decoy
- Which scores to export?
- Best hit only?
- histogram or full density?
- nr targets vs decoys
- hits per psm?
idXMLs (after Perc/IDPEP):
- target vs decoy distribution again
idXMLs (after consensusID):
- overlap between search engines (e.g. 2D plot for every pair of search engines)
- histogram of number of times a psm was identified with same, with different, ...
- nr targets vs decoys
- hits per psm?
idXMLs (after filtering):
- do we need anything here?
idXMLs (after inference):
- see #27
- depends a bit on the order of FDR filtering if this can be inferred by comparing the mzTab with the raw IDs per file (but currently we do FDR filter before quantification, therefore it indeed might be helpful to know if a protein is missing because of filtering after inference or because of missing quant data
- in any case, we need that information since we per-default also filter out decoys and a target-decoy score distribution plot would be helpful for proteins as well.
- for TMT the inference idXML is easily accessible
features:
- since we only generate features internally for ProteomicsLFQ, we must export summarized feature QC metrics during execution (or write out the temporary featureXMLs even without debug mode).
- for TMT this does not really exist because the "consensus" features are not really 2D features
consensus features:
- is there anything important that is not available in the mzTab?
lets also keep https://github.com/axelwalter and https://github.com/cbielow in the loop