mzLib
mzLib copied to clipboard
Do two tech-reps to test deconvolution
Same masses should have same intensities and same elution times. The ones that do (with some tolerance) are considered to be real. Others considered to be wrong. Do machine learning to learn to separate real from wrong.
Another machine-learning idea:
For the purpose of using machine learning to determine a deconvolution scoring formula, we need a good training set of true-positive identifications (i.e. masses). We could use NeuCode-labeled yeast or E coli data that we already have for this purpose. The lysine count and proteoform suite error-checking methods would give us increased confidence that the deconvoluted masses in the training set are true-positives. This assumes that thermo deconvolution and proteoform suite identify masses correctly; we could limit it to the most confident IDs and/or some other criteria.
We also need a set of false-positive identifications! Without it, I don't see how the machine learning would proceed