computational performance advice
I ran helixfold3 using a json file which specified three protein sequences: two antibody chains and an antigen. I used run_infer.sh which has --preset='reduced_dbs', however I had downloaded the full database.
The run time was 22.8 minutes.
This was on a large memory, many-core server having an L40S GPU.
Is that run time approximately as expected? It would be prohibitively slow for the experiments I had planned.
Can you tell us the run time of MSA and inference respectively? The time consumption is mainly in the MSA part, in addition, the sequence length will also affect the inference time. You can speed up your MSA search by modifying it: https://github.com/PaddlePaddle/PaddleHelix/blob/dev/apps/protein_folding/helixfold3/infer_scripts/feature_processing_aa.py#L41
Thank you! Very little time is spent in inference. It would appear that almost all the time is spent in Jackhmmer, and that core utilization by Jackhmmer is low.
We set up an msa server on our machine, and would prefer to use that, if it is possible and if it would be unlikely to degrade results, because that would free up disk space and perhaps be faster.
Other structure generation programs that we've tested spend far less time in MSA generation.
Generally, we're interested in anything that would speed up the process, so long as results are good!
Great Job, to set up an msa server may be the best alternative, such as colabfold.
Thank you! Would it be possible for you to modify inference.py to support arguments
--use_msa_server Whether to use the MMSeqs2 server for MSA generation. Default is False.
--msa_server_url TEXT MSA server url. Used only if --use_msa_server is set.
? This appears to be the standard that other structure prediction programs use.
@DavidBJaffe Thanks for your suggestion! But due to constraints in human, we have no accurate time schedule to perfect this search msa functionality. We are extremely sorry.
Thank you for responding!