DeepRTplus icon indicating copy to clipboard operation
DeepRTplus copied to clipboard

Understanding Prediction Results

Open bennyruss opened this issue 4 years ago • 3 comments

hello,

I am trying to make sense of some results I got from the model, but it seems like they are on different scales. Here is a sample output below. This result is

observed | predicted 0.06345 | 0.72494 0.07636 | 0.37529 0.082 | 0.66338 0.08482 | 0.5969 0.08264 | 0.46543 0.07091 | 0.43927 0.067 | 0.26192 0.07445 | 0.25262 0.06682 | 0.26192 0.05955 | 0.40488 ` I have a dataset of 78 peptides I would like to test on, so I put that dataset in the test_path folder. However, I left all the other parameters the same. The RT for the test data is in seconds. train_path = 'data/mod_train_2.txt' test_path = 'data/DeepRTtest.txt' result_path = 'work/mod_pred_test.txt' log_path = 'work/mod_test.log' save_prefix = 'work/mod/2/3' pretrain_path = '' dict_path = ''

conv1_kernel = 12 conv2_kernel = 12 min_rt = 0 max_rt = 110 time_scale = 60 max_length = 50

Thank you for the help!

bennyruss avatar Jun 10 '20 14:06 bennyruss

Hi,

If you only want to predict RT values for a handful of testing data (without training), I would suggest using:

python prediction_emb_cpu.py max_rt param_cpu/dia_all_epo20_dim24_conv12/dia_all_epo20_dim24_conv12_filled.pt 12 data/DeepRTtest.txt

where max_rt is the maximum RT value among your 78 items (see here for details). In "config.py" the only change would be:

max_length = 66 # since we are using the "DIA" data for prediction, we change it to be the max peptide length of it

Best,

horsepurve avatar Jun 10 '20 20:06 horsepurve

Hi there, thank you for the response!

So in the config.py file should I specify the max_rt of my data in minutes and then the scale of 60 converts that to seconds? Like so: `train_path = 'data/mod_train_2.txt' test_path = 'data/mod_test_2.txt' result_path = 'work/mod_test_2.pred.txt' log_path = 'work/mod_test_2.log' save_prefix = 'work/mod/2/3' pretrain_path = '' dict_path = ''

conv1_kernel = 12 conv2_kernel = 12 min_rt = 0 max_rt = 11.2 time_scale = 60 max_length = 66 `

And then when I run the script below you told me to run I put the max_rt in seconds?

python prediction_emb_cpu.py **672** param_cpu/dia_all_epo20_dim24_conv12/dia_all_epo20_dim24_conv12_filled.pt 12 data/DeepRTtest.txt

I will also note that the data I am trying to predict is from an RPLC system. Am I using the right training files?

Thanks, Ben

bennyruss avatar Jun 11 '20 14:06 bennyruss

Hi, sorry for the delayed response. Yes, generally all the RT values here are in minutes so "max_rt = 11.2 min", and "time_scale = 60" converts RT in the data file to minutes. When running the "prediction_emb_cpu.py" script, if the max_rt is in minutes (i.e. 11.2) then the predicted values would be written in minutes, and if it's in seconds (i.e. 672) then the output file would be in seconds too.

Yes, the model provided here is also from an RPLC system [1], however, this model usually cannot be directly applied to another dataset from RPLC, because the gradients are usually different. So directly running "prediction_emb_cpu.py" would only give estimated RT of peptides other than their precise retention times under your chromatographic condition. To obtain a more accurate prediction, a bunch of calibration peptides is typically needed (i.e. transfer learning).

[1] A Repository of Assays to Quantify 10,000 Human Proteins by SWATH-MS. Sci. Data 2014, 1, 140031, DOI: 10.1038/sdata.2014.31

horsepurve avatar Jun 13 '20 22:06 horsepurve