IPO
IPO copied to clipboard
xcmsSetStatistic() may produce model with suboptimal results
Hello,
Recently I started delving in the (very interesting!) IPO package. By pure coincidence I noticed that during a test run the final results were not optimal.
Reproducible example:
# test dataset
devtools::install_github("rickhelmus/patRoonData")
anaList <- list.files(patRoonData::exampleDataPath(), pattern = "\\.mzML", full.names = TRUE)
ppParams <- IPO::getDefaultXcmsSetStartingParams("centWave")
ppParams$min_peakwidth <- c(4, 12)
ppParams$ppm <- c(3, 10)
ppParams$method <- "centWave"
iOpt <- IPO::optimizeXcmsSet(anaList[4:5], ppParams, nSlaves = 1)
The experimental results and plots of the fourth (and final) experiment look promising:
> iOpt[[4]]$response
exp num_peaks notLLOQP num_C13 PPS
[1,] 1 543 288 118 48.34722
[2,] 2 170 65 46 32.55385
[3,] 3 573 314 118 44.34395
[4,] 4 208 80 60 45.00000
[5,] 5 568 306 122 48.64052
[6,] 6 186 74 46 28.59459
[7,] 7 596 320 121 45.75312
[8,] 8 228 93 64 44.04301
[9,] 9 543 288 118 48.34722
[10,] 10 170 65 46 32.55385
[11,] 11 573 314 118 44.34395
[12,] 12 208 80 60 45.00000
[13,] 13 567 306 122 48.64052
[14,] 14 186 74 46 28.59459
[15,] 15 595 321 119 44.11526
[16,] 16 228 93 64 44.04301
[17,] 17 266 75 80 85.33333
[18,] 18 572 295 125 52.96610
[19,] 19 195 75 52 36.05333
[20,] 20 235 70 76 82.51429
[21,] 21 365 153 98 62.77124
[22,] 22 258 69 82 97.44928
[23,] 23 269 80 84 88.20000
[24,] 24 266 75 80 85.33333
[25,] 25 266 75 80 85.33333
[26,] 26 266 75 80 85.33333
However, the final result calculated by the model has a much lower score:
> max(iOpt[[4]]$response[, 5])
[1] 97.44928
> iOpt[[4]]$PPS
ExpId #peaks #NonRP #RP PPS
0.00000 322.00000 124.00000 88.00000 62.45161
I suspect the final combination of parameters results in a corner case where XCMS suddenly yields very different results than what the model could predict. However I'm just brushing up my DoE knowledge so any ideas here would be welcome!
In this case the final result is lower than the third experiment (PPS: 85.3), hence, resultIncreased()
returns FALSE
. Interestingly, since the max_settings
are used to find the 'best' experimental iteration and are calculated by the model (i.e. instead of the actual result), the last experiment is still taken as optimum result.
Anyway, I noticed that IPO is (unfortunately) not anymore actively maintained. Still I hope to bring up some discussion what could be a solution to this. A simple method might be to actually check if the response from the model parameters is actually best and when it's not, take the best conditions from the experiments that led to the model. What do you think?
Hi @rickhelmus, thanks a lot for your comment and for starting this discussion.
The point you brought up is a very good one. I did notice this behaviour before, but never had the chance to implement this enhancement. To enhance this, the function optimizeXcmsSet
would need to be adjusted.
You are right, that IPO is not any more actively maintained at the moment. There is still some discussion how IPO might be further developed in the future, but unfortunately I won't be able to do so in the near future. But maybe this is a case to get the ball rolling.
Thanks for starting the discussion (and sorry for my belated reply).
I've implemented the simple change where it switches to parameters from an experiment with better response when this situation happens (with some user defined allow deviation). This seems to improve things, at least.
I also tried to add the any sub-optimal results to the model in the hope to improve prediction. This, however, seem to only make things worse.