openfold icon indicating copy to clipboard operation
openfold copied to clipboard

Running inference using MultiGPU

Open chinmayraneQ opened this issue 1 year ago • 6 comments

Hi everyone,

I am trying to run inference using multiGPU. I am currently able to run it on a single gpu which is by default using parser.add_argument( "--model_device", type=str, default="cpu", help="""Name of the device on which to run the model. Any valid torch device name is accepted (e.g. "cpu", "cuda:0")""" where I can use cuda:0 for gpu 0 and cuda:1 for gpu 1 etc.

But i did not find an argument for using distributed gpu argument as in train_openfold.py in gpu argument

Thanks in advance

chinmayraneQ avatar Apr 28 '23 17:04 chinmayraneQ

Hi @chinmayraneQ, this is my guess (and others who know better may correct it): There is no such argument. As things are implemented, the model must fit in the GPU RAM. During training, multiple GPUs are used sort-of independently on independent training examples and then the incremental changes are combined. It is like a series of jobs done on either one GPU or more. Alternatively, to fit a really big jobs, one could borrow RAM from additional GPUs if they are available and have fast RAM interconnect. (I am not sure how well this is supported by unified memory. For sure it can borrow RAM from the CPU. But training avoids such need by working on parts of proteins.) I guess you have these two options for inference, too:

  • If you have several jobs and multiple GPUs, just run multiple processes, each using one of the GPUs.
  • If you have a really big job, you might be able to use one GPU with additional RAM borrowed from the other (again, not sure this is really well supported by drivers - certainly you can borrow CPU RAM). For inference with long sequences this may be necessary.

Which of the two cases is yours?

vaclavhanzl avatar Apr 28 '23 18:04 vaclavhanzl

Thanks @vaclavhanzl for your reply. So you mean i have to have run the inference multiple gpus by running inference command individually in each terminal with setting "Cuda:0" "Cuda:1"?

Currently i am just in a intial phase for testing the enviornment and i was just using one file - https://rest.uniprot.org/uniprotkb/P06214.fasta which ran successfully on one V100 Gpu using 100% out of 4 V100. Then I tried one more fasta file in the hopes that it will probably use 2 gpus for each of the file inference, but i got an error as follows

NFO:/workspace/openfold/openfold/utils/script_utils.py:Loaded OpenFold parameters at /workspace/models/openfold_params/finetuning_ptm_2.pt... INFO:run_pretrained_openfold.py:Generating alignments for sp|P06214|HEM2_RAT... Traceback (most recent call last): File "run_pretrained_openfold.py", line 401, in main(args) File "run_pretrained_openfold.py", line 211, in main precompute_alignments(tags, seqs, alignment_dir, args) File "run_pretrained_openfold.py", line 88, in precompute_alignments tmp_fasta_path, local_alignment_dir File "/workspace/openfold/openfold/data/data_pipeline.py", line 446, in run fasta_path File "/workspace/openfold/openfold/data/tools/jackhmmer.py", line 186, in query return [self._query_chunk(input_fasta_path, self.database_path)] File "/workspace/openfold/openfold/data/tools/jackhmmer.py", line 161, in _query_chunk "Jackhmmer failed\nstderr:\n%s\n" % stderr.decode("utf-8") RuntimeError: Jackhmmer failed stderr: Fatal exception (source file p7_pipeline.c, line 697): Target sequence length > 100K, over comparison pipeline limit. (Did you mean to use nhmmer/nhmmscan?) Any suggestions on this. Our observation is maybe 2 sequence were not able to fit on a single gpu

So I believe we have to separate the folders of each sequence files and run individually on each gpus?

Thanks again for your quick response

chinmayraneQ avatar Apr 28 '23 20:04 chinmayraneQ

I guess it failed while computing MSA, before doing anything with GPU(s). Maybe you could share your full command line?

And yes, totally separated runs on "Cuda:0" "Cuda:1" is what I was trying to suggest.

vaclavhanzl avatar Apr 28 '23 20:04 vaclavhanzl

Sure, I am using the same run_pretrained_openfold.py command

python3 run_pretrained_openfold.py /workspace/fasta_dir /workspace/dataset/pdb_mmcif/mmcif_files/ --uniref90_database_path /workspace/dataset/uniref90/uniref90_fasta.fasta --mgnify_database_path /workspace/dataset/mgnify/mgy_clusters_2018_12.fa --pdb70_database_path /workspace/dataset/pdb70/pdb70 --uniclust30_database_path /workspace/dataset/uniclust30/uniclust30_2018_08/uniclust30_2018_08 --output_dir ./ --bfd_database_path /workspace/dataset/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt --model_device "cuda:3" --jackhmmer_binary_path lib/conda/envs/openfold_venv/bin/jackhmmer --hhblits_binary_path lib/conda/envs/openfold_venv/bin/hhblits --hhsearch_binary_path lib/conda/envs/openfold_venv/bin/hhsearch --kalign_binary_path lib/conda/envs/openfold_venv/bin/kalign --config_preset "model_1_ptm" --openfold_checkpoint_path /workspace/models/openfold_params/finetuning_ptm_2.pt

I havent tried the --long_sequence_inference argument yet

chinmayraneQ avatar Apr 28 '23 20:04 chinmayraneQ

I'd double-check format of things in /workspace/fasta_dir . Also, can you please find out version of the installed jackhmmer?

vaclavhanzl avatar Apr 28 '23 20:04 vaclavhanzl

  • I have used wget directly from https://rest.uniprot.org/uniprotkb/P06214.fasta. As mentioned it worked for one file but the error was for 2 files.

  • I did not get the previous error if i used --long_sequence_inference. It processed sequentially on one gpu and also used less GPU utilization. i will be trying 3 sequences with same length.

Also I created a issues that I am facing with training here where i cannot compute alignments. Any suggestions

https://github.com/aqlaboratory/openfold/issues/313

chinmayraneQ avatar Apr 28 '23 21:04 chinmayraneQ