TCRdock icon indicating copy to clipboard operation
TCRdock copied to clipboard

Parallel setup_for_alphafold.py ?

Open StysPaul opened this issue 2 years ago • 5 comments

Hello, The script that prepares the data for prediction, setup_for_alphafold.py, is long when you have a lot of data. I've noticed that it's not parallelized, is it possible to speed it up ?

StysPaul avatar Aug 25 '23 14:08 StysPaul

Hi Paul, I bet that would be possible. For me, the bottleneck is running the predictions, so I haven't put much thought into doing that yet. Believe it or not, it used to be even slower!

phbradley avatar Aug 25 '23 14:08 phbradley

Also, note that there is an updated docking protocol and associated fine-tuned parameter set available. You can see an example of how that works in the google-colab jupyter notebook:

https://github.com/phbradley/TCRdock/blob/main/tcrdock_colab_pipeline_v1.ipynb

phbradley avatar Aug 25 '23 14:08 phbradley

tldr:

  1. update to the newest version of the TCRdock repository
  2. add the flag --new_docking when running setup_for_alphafold.py
  3. add the flags --model_names model_2_ptm_ft4 --model_params_files /path/to/tcrpmhc_run4_af_mhc_params_891.pkl when running run_prediction.py, where that .pkl file is the fine-tuned parameter set downloaded from here:

https://www.dropbox.com/s/jph8v1mfni1q4y8/tcrpmhc_run4_af_mhc_params_891.pkl

phbradley avatar Aug 25 '23 14:08 phbradley

Indeed, predictions take much longer than setup. It was a comment to optimize a little. Thanks for the information about the changes and the new model, I'm currently testing this. By the way, I was hoping that with a new model the calculation time would decrease a bit, but the duration is the same. It's just the precision that's improved?

StysPaul avatar Aug 27 '23 11:08 StysPaul

Good question! The new model just runs a single alphafold simulation per target, rather than running 3 and taking the best score by pMHC:TCR PAE, so in that sense it is 3x faster.

phbradley avatar Aug 27 '23 14:08 phbradley