IPO icon indicating copy to clipboard operation
IPO copied to clipboard

xcmsSetStatistic() may produce model with suboptimal results

Open rickhelmus opened this issue 5 years ago • 2 comments

Hello,

Recently I started delving in the (very interesting!) IPO package. By pure coincidence I noticed that during a test run the final results were not optimal.

Reproducible example:

# test dataset
devtools::install_github("rickhelmus/patRoonData")

anaList <- list.files(patRoonData::exampleDataPath(), pattern = "\\.mzML", full.names = TRUE)

ppParams <- IPO::getDefaultXcmsSetStartingParams("centWave")
ppParams$min_peakwidth <- c(4, 12)
ppParams$ppm <- c(3, 10)
ppParams$method <- "centWave"

iOpt <- IPO::optimizeXcmsSet(anaList[4:5], ppParams, nSlaves = 1)

The experimental results and plots of the fourth (and final) experiment look promising:

> iOpt[[4]]$response
      exp num_peaks notLLOQP num_C13      PPS
 [1,]   1       543      288     118 48.34722
 [2,]   2       170       65      46 32.55385
 [3,]   3       573      314     118 44.34395
 [4,]   4       208       80      60 45.00000
 [5,]   5       568      306     122 48.64052
 [6,]   6       186       74      46 28.59459
 [7,]   7       596      320     121 45.75312
 [8,]   8       228       93      64 44.04301
 [9,]   9       543      288     118 48.34722
[10,]  10       170       65      46 32.55385
[11,]  11       573      314     118 44.34395
[12,]  12       208       80      60 45.00000
[13,]  13       567      306     122 48.64052
[14,]  14       186       74      46 28.59459
[15,]  15       595      321     119 44.11526
[16,]  16       228       93      64 44.04301
[17,]  17       266       75      80 85.33333
[18,]  18       572      295     125 52.96610
[19,]  19       195       75      52 36.05333
[20,]  20       235       70      76 82.51429
[21,]  21       365      153      98 62.77124
[22,]  22       258       69      82 97.44928
[23,]  23       269       80      84 88.20000
[24,]  24       266       75      80 85.33333
[25,]  25       266       75      80 85.33333
[26,]  26       266       75      80 85.33333

rsm_4

However, the final result calculated by the model has a much lower score:

> max(iOpt[[4]]$response[, 5])
[1] 97.44928

> iOpt[[4]]$PPS
    ExpId    #peaks    #NonRP       #RP       PPS 
  0.00000 322.00000 124.00000  88.00000  62.45161

I suspect the final combination of parameters results in a corner case where XCMS suddenly yields very different results than what the model could predict. However I'm just brushing up my DoE knowledge so any ideas here would be welcome!

In this case the final result is lower than the third experiment (PPS: 85.3), hence, resultIncreased() returns FALSE. Interestingly, since the max_settings are used to find the 'best' experimental iteration and are calculated by the model (i.e. instead of the actual result), the last experiment is still taken as optimum result.

Anyway, I noticed that IPO is (unfortunately) not anymore actively maintained. Still I hope to bring up some discussion what could be a solution to this. A simple method might be to actually check if the response from the model parameters is actually best and when it's not, take the best conditions from the experiments that led to the model. What do you think?

rickhelmus avatar Oct 22 '18 20:10 rickhelmus