FastFold
FastFold copied to clipboard
Optimizing AlphaFold Training and Inference on GPU Clusters
FastFold
Optimizing Protein Structure Prediction Model Training and Inference on GPU Clusters
FastFold provides a high-performance implementation of Evoformer with the following characteristics.
- Excellent kernel performance on GPU platform
- Supporting Dynamic Axial Parallelism(DAP)
- Break the memory limit of single GPU and reduce the overall training time
- DAP can significantly speed up inference and make ultra-long sequence inference possible
- Ease of use
- Huge performance gains with a few lines changes
- You don't need to care about how the parallel part is implemented
- Faster data processing, about 3x times faster than the original way
Installation
To install and use FastFold, you will need:
- Python 3.8 or 3.9.
- NVIDIA CUDA 11.1 or above
- PyTorch 1.10 or above
For now, You can install FastFold:
Using Conda (Recommended)
We highly recommend installing an Anaconda or Miniconda environment and install PyTorch with conda. Lines below would create a new conda environment called "fastfold":
git clone https://github.com/hpcaitech/FastFold
cd FastFold
conda env create --name=fastfold -f environment.yml
conda activate fastfold
bash scripts/patch_openmm.sh
python setup.py install
Using PyPi
You can download FastFold with pre-built CUDA extensions.
pip install fastfold -f https://release.colossalai.org/fastfold
Use Docker
Build On Your Own
Run the following command to build a docker image from Dockerfile provided.
Building FastFold from scratch requires GPU support, you need to use Nvidia Docker Runtime as the default when doing
docker build
. More details can be found here.
cd ColossalAI
docker build -t fastfold ./docker
Run the following command to start the docker container in interactive mode.
docker run -ti --gpus all --rm --ipc=host fastfold bash
Usage
You can use Evoformer
as nn.Module
in your project after from fastfold.model.fastnn import Evoformer
:
from fastfold.model.fastnn import Evoformer
evoformer_layer = Evoformer()
If you want to use Dynamic Axial Parallelism, add a line of initialize with fastfold.distributed.init_dap
.
from fastfold.distributed import init_dap
init_dap(args.dap_size)
Download the dataset
You can down the dataset used to train FastFold by the script download_all_data.sh
:
./scripts/download_all_data.sh data/
Inference
You can use FastFold with inject_fastnn
. This will replace the evoformer from OpenFold with the high performance evoformer from FastFold.
from fastfold.utils import inject_fastnn
model = AlphaFold(config)
import_jax_weights_(model, args.param_path, version=args.model_name)
model = inject_fastnn(model)
For Dynamic Axial Parallelism, you can refer to ./inference.py
. Here is an example of 2 GPUs parallel inference:
python inference.py target.fasta data/pdb_mmcif/mmcif_files/ \
--output_dir ./ \
--gpus 2 \
--uniref90_database_path data/uniref90/uniref90.fasta \
--mgnify_database_path data/mgnify/mgy_clusters_2018_12.fa \
--pdb70_database_path data/pdb70/pdb70 \
--uniclust30_database_path data/uniclust30/uniclust30_2018_08/uniclust30_2018_08 \
--bfd_database_path data/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
--jackhmmer_binary_path `which jackhmmer` \
--hhblits_binary_path `which hhblits` \
--hhsearch_binary_path `which hhsearch` \
--kalign_binary_path `which kalign`
or run the script ./inference.sh
, you can change the parameter in the script, especisally those data path.
./inference.sh
inference with data workflow
Alphafold's data pre-processing takes a lot of time, so we speed up the data pre-process by ray workflow, which achieves a 3x times faster speed. To run the inference with ray workflow, you should install the package and add parameter --enable_workflow
to cmdline or shell script ./inference.sh
pip install ray==1.13.0 pyarrow
python inference.py target.fasta data/pdb_mmcif/mmcif_files/ \
--output_dir ./ \
--gpus 2 \
--uniref90_database_path data/uniref90/uniref90.fasta \
--mgnify_database_path data/mgnify/mgy_clusters_2018_12.fa \
--pdb70_database_path data/pdb70/pdb70 \
--uniclust30_database_path data/uniclust30/uniclust30_2018_08/uniclust30_2018_08 \
--bfd_database_path data/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
--jackhmmer_binary_path `which jackhmmer` \
--hhblits_binary_path `which hhblits` \
--hhsearch_binary_path `which hhsearch` \
--kalign_binary_path `which kalign` \
--enable_workflow
Performance Benchmark
We have included a performance benchmark script in ./benchmark
. You can benchmark the performance of Evoformer using different settings.
cd ./benchmark
torchrun --nproc_per_node=1 perf.py --msa-length 128 --res-length 256
Benchmark Dynamic Axial Parallelism with 2 GPUs:
cd ./benchmark
torchrun --nproc_per_node=2 perf.py --msa-length 128 --res-length 256 --dap-size 2
If you want to benchmark with OpenFold, you need to install OpenFold first and benchmark with option --openfold
:
torchrun --nproc_per_node=1 perf.py --msa-length 128 --res-length 256 --openfold
Cite us
Cite this paper, if you use FastFold in your research publication.
@misc{cheng2022fastfold,
title={FastFold: Reducing AlphaFold Training Time from 11 Days to 67 Hours},
author={Shenggan Cheng and Ruidong Wu and Zhongming Yu and Binrui Li and Xiwen Zhang and Jian Peng and Yang You},
year={2022},
eprint={2203.00854},
archivePrefix={arXiv},
primaryClass={cs.LG}
}