alphafold icon indicating copy to clipboard operation
alphafold copied to clipboard

non-docker AF2 on HPC cannot use GPU xla_bridge.py:257] No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)

Open liuqs1990 opened this issue 2 years ago • 1 comments

Hi I am planning to use non-docker version on the HPC with GPU I use https://github.com/amorehead/alphafold_non_docker

I found the run_alphafold.sh is nor working well so I checked this one https://sbgrid.org/wiki/examples/alphafold2 and seems working using the run_alphafold.py script.

Here is what I have done:

As the website indicating, I use: TF_FORCE_UNIFIED_MEMORY=1 XLA_PYTHON_CLIENT_MEM_FRACTION=0.5 XLA_PYTHON_CLIENT_ALLOCATOR=platform

and then:

nvidia-smi image

In order to use all GPUs, I use: export CUDA_VISIBLE_DEVICES=0,1,2

here is a screenshot of my GPU HPC: htop image

python run_alphafold.py \
--data_dir=/export/home2/ql9f/download/ \
--output_dir=/export/III-data/waters/ql9f/RESC6/T1083outdir \
--fasta_paths=/export/III-data/waters/ql9f/RESC6/T1083.fasta \
--max_template_date=2020-05-14 \
--db_preset=reduced_dbs \
--model_preset=monomer \
--uniref90_database_path=/export/home2/ql9f/download/uniref90/uniref90.fasta \
--mgnify_database_path=/export/home2/ql9f/download/mgnify/mgy_clusters_2018_12.fa \
--template_mmcif_dir=/export/home2/ql9f/download/pdb_mmcif/mmcif_files \
--obsolete_pdbs_path=/export/home2/ql9f/download/pdb_mmcif/obsolete.dat \
--small_bfd_database_path=/export/home2/ql9f/download/small_bfd/bfd-first_non_consensus_sequences.fasta \
--pdb70_database_path=/export/home2/ql9f/download/pdb70/pdb70 \
--use_gpu_relax=True 

The program is running but it did not use GPU as I expected, and the RAM only uses like 10GB which I think is too low. So my question is: how can I use all the GPU and RAM and possibly all cores/threads on the HPC with GPU and make my prediction faster? Why does the program CANNOT find GPU?

any suggestion would be great. Qiushi

liuqs1990 avatar Mar 27 '22 01:03 liuqs1990

Are you submitting this as a Slurm job script on the cluster? If so, are you requesting all GPUs on the compute node in the #SBATCH parameters? Although, AlphaFold currently can only utilize one GPU.

charmichaeld avatar Apr 14 '22 21:04 charmichaeld