DeepRTplus
DeepRTplus copied to clipboard
Understanding Prediction Results
hello,
I am trying to make sense of some results I got from the model, but it seems like they are on different scales. Here is a sample output below. This result is
observed | predicted 0.06345 | 0.72494 0.07636 | 0.37529 0.082 | 0.66338 0.08482 | 0.5969 0.08264 | 0.46543 0.07091 | 0.43927 0.067 | 0.26192 0.07445 | 0.25262 0.06682 | 0.26192 0.05955 | 0.40488 ` I have a dataset of 78 peptides I would like to test on, so I put that dataset in the test_path folder. However, I left all the other parameters the same. The RT for the test data is in seconds. train_path = 'data/mod_train_2.txt' test_path = 'data/DeepRTtest.txt' result_path = 'work/mod_pred_test.txt' log_path = 'work/mod_test.log' save_prefix = 'work/mod/2/3' pretrain_path = '' dict_path = ''
conv1_kernel = 12 conv2_kernel = 12 min_rt = 0 max_rt = 110 time_scale = 60 max_length = 50
Thank you for the help!
Hi,
If you only want to predict RT values for a handful of testing data (without training), I would suggest using:
python prediction_emb_cpu.py max_rt param_cpu/dia_all_epo20_dim24_conv12/dia_all_epo20_dim24_conv12_filled.pt 12 data/DeepRTtest.txt
where max_rt is the maximum RT value among your 78 items (see here for details). In "config.py" the only change would be:
max_length = 66 # since we are using the "DIA" data for prediction, we change it to be the max peptide length of it
Best,
Hi there, thank you for the response!
So in the config.py file should I specify the max_rt of my data in minutes and then the scale of 60 converts that to seconds? Like so: `train_path = 'data/mod_train_2.txt' test_path = 'data/mod_test_2.txt' result_path = 'work/mod_test_2.pred.txt' log_path = 'work/mod_test_2.log' save_prefix = 'work/mod/2/3' pretrain_path = '' dict_path = ''
conv1_kernel = 12 conv2_kernel = 12 min_rt = 0 max_rt = 11.2 time_scale = 60 max_length = 66 `
And then when I run the script below you told me to run I put the max_rt in seconds?
python prediction_emb_cpu.py **672** param_cpu/dia_all_epo20_dim24_conv12/dia_all_epo20_dim24_conv12_filled.pt 12 data/DeepRTtest.txt
I will also note that the data I am trying to predict is from an RPLC system. Am I using the right training files?
Thanks, Ben
Hi, sorry for the delayed response. Yes, generally all the RT values here are in minutes so "max_rt = 11.2 min", and "time_scale = 60" converts RT in the data file to minutes. When running the "prediction_emb_cpu.py" script, if the max_rt is in minutes (i.e. 11.2) then the predicted values would be written in minutes, and if it's in seconds (i.e. 672) then the output file would be in seconds too.
Yes, the model provided here is also from an RPLC system [1], however, this model usually cannot be directly applied to another dataset from RPLC, because the gradients are usually different. So directly running "prediction_emb_cpu.py" would only give estimated RT of peptides other than their precise retention times under your chromatographic condition. To obtain a more accurate prediction, a bunch of calibration peptides is typically needed (i.e. transfer learning).
[1] A Repository of Assays to Quantify 10,000 Human Proteins by SWATH-MS. Sci. Data 2014, 1, 140031, DOI: 10.1038/sdata.2014.31