patRoon icon indicating copy to clipboard operation
patRoon copied to clipboard

MetFrag Score Calculation - How are they calculated in patRoon?

Open Theodoralee27 opened this issue 5 months ago • 6 comments

Dear Rick and Team,

I would like to ask for help regarding how the scores for each feature are calculated from the generateCompounds() function. My main objective is to classify the features identified using Schymanski's level of confidence (Levels 2a - 5). Currently the most relevant function that links both the scores calculated in patRoon and Schymanski's confidence levels is the annotateSuspects function - but I realised this is only relevant for suspect screening, while the data im interested in classifying is the non-targeted data.

Here is a preview of what my data looks like after I have use the generateCompounds() function using metfrag and the Comptox database. image

  1. My first question I would like to clarify is, is the score (in column D; highlighted) calculated in the same weighted format as this study Lai, A. et al. (2021) ‘Retrospective non-target analysis to support regulatory water monitoring: from masses of interest to recommendations via in silico workflows’, Environmental Sciences Europe. Springer Berlin Heidelberg, 33(1), pp. 1–21. doi: 10.1186/s12302-021-00475-1.

Here is a screen shot of their calculations: image

I tried very briefly to calculate using the data I have, the resulting scores are pretty similar. I would like to confirm is that the correct calculation method? If not, can you kindly refer me to any document that discusses the calculation of the resultant scores?

  1. My second question is a problem developed from the first - with the known max score of 9.0, I had two features in my dataset with a score of 13 and above. I was very confused which made me doubt the calculation methods for the scores, but when I went to investigate each parameter, I realised the value for 'DATA_SOURCES' for these two features were unusually high, around 40 - 42, while the rest of the other features only ranged about 1 - 8. Because of the way each parameter is normalised against the largest value, I was wondering if the system could recognise high outlier values like the 40 and 42, such that they are excluded from the analysis, otherwise all the other compounds scores will be very low for that parameter ("DATA_SOURCES") - again this is my speculation. I will upload one batch of data with that feature for your reference. Compounds_comptox_Npb6.csv

  2. Is there an established conversion between metFrag scores and Schymanski's confidence levels? Unfortunately I havent been able to find much supporting evidence for this. The closest would be the use of the IndividualMoNA score > 0.9 to match a Schymanski level 2a (same study Lai et al 2021). Another study suggested a range of MoNA scores for levels 3a and 3b in Talavera Andújar, B. et al. (2022) ‘Studying the Parkinson’s disease metabolome and exposome in biological samples through different analytical and cheminformatics approaches: a pilot study’, Analytical and Bioanalytical Chemistry. Springer Berlin Heidelberg, 414(25), pp. 7399–7419. doi: 10.1007/s00216-022-04207-z. image But all my values for the IndividualMoNA for all my features were all 0 - would this mean that all my features are of level 3c or lower confidence?

Thank you so much for your help in advance! I would really appreciate if you can share any details regarding the score calculation!

Sincerely, Theo

Theodoralee27 avatar Sep 05 '24 09:09 Theodoralee27