TCRdock
TCRdock copied to clipboard
Parallel setup_for_alphafold.py ?
Hello, The script that prepares the data for prediction, setup_for_alphafold.py, is long when you have a lot of data. I've noticed that it's not parallelized, is it possible to speed it up ?
Hi Paul, I bet that would be possible. For me, the bottleneck is running the predictions, so I haven't put much thought into doing that yet. Believe it or not, it used to be even slower!
Also, note that there is an updated docking protocol and associated fine-tuned parameter set available. You can see an example of how that works in the google-colab jupyter notebook:
https://github.com/phbradley/TCRdock/blob/main/tcrdock_colab_pipeline_v1.ipynb
tldr:
- update to the newest version of the TCRdock repository
- add the flag
--new_dockingwhen runningsetup_for_alphafold.py - add the flags
--model_names model_2_ptm_ft4 --model_params_files /path/to/tcrpmhc_run4_af_mhc_params_891.pklwhen runningrun_prediction.py, where that.pklfile is the fine-tuned parameter set downloaded from here:
https://www.dropbox.com/s/jph8v1mfni1q4y8/tcrpmhc_run4_af_mhc_params_891.pkl
Indeed, predictions take much longer than setup. It was a comment to optimize a little. Thanks for the information about the changes and the new model, I'm currently testing this. By the way, I was hoping that with a new model the calculation time would decrease a bit, but the duration is the same. It's just the precision that's improved?
Good question! The new model just runs a single alphafold simulation per target, rather than running 3 and taking the best score by pMHC:TCR PAE, so in that sense it is 3x faster.