TaxDiff
TaxDiff copied to clipboard
The official code for "TaxDiff: Taxonomic-Guided Diffusion Model for Protein Sequence Generation"
TaxDiff: Taxonomic-Guided Diffusion Model for Protein Sequence Generation
If you like our project, please give us a star ⭐ on GitHub for latest update.
The official code for "TaxDiff: Taxonomic-Guided Diffusion Model for Protein Sequence Generation",submitted to ICML2024. Here we publish the inference code of TaxDiff. The training code & Protein sequence with Taxonomic lables dataset will be released after our paper is accepted.
ProLLaMA: A Protein Large Language Model for Multi-Task Protein Language Processing 💡 I also have other AI for Science projects that may interest you ✨.
Liuzhenghao Lv, Zongying Lin, Li Hao, Yuyang Liu, Jiaxi Cui, Calvin Yu-Chian Chen, Li Yuan, Yonghong Tian
😮 Highlights
💡 Protein sequences Generation Model
- To the best of our knowledge, our TaxDiff is the first controllable protein generation model utilizing guidance from taxonomies.
🔥 Diffusion-based Framework
- TaxDiff proposes a taxonomic-guided framework that fits all diffusion-based protein design models. We also propose the patchify attention mechanism for better protein design.
⭐ Excellent performance
- Experiments demonstrate that our TaxDiff achieves state-of-the-art results in both taxonomic-guided controllable and unconditional protein sequence generation, excelling in structural modeling scores and sequence consistency.
🚀 Main Results
More detailed results can be found in our paper.
Unconditional Generation
Controllable Generation
📖 Data Preparation
For inference, please download from HuggingFace and put the ckpt into the folder ckpt/
ckpt/0012802.ckpt
We will release protein sequences with taxonmic labels for training procedure once our paper is accepted.
If you want to select a specific protein taxonomic for your research, you need to first find his corresponding tax-id in the data_reader/Taxonnmic_classfication.xlsx, and then modify protein class lables in the sample_protein.py.
class_lables = torch.randint(low=1, high=int(23427), size=(1,num))
🛠️ Requirements and Installation
- Python == 3.10
- Pytorch == 2.2.0
- Torchvision == 0.17.0
- CUDA Version == 12.0
- Install required packages:
git clone git@[github.com/Linzy19/TaxDiff.git]
cd TaxDiff
pip install -r requirements.txt
🗝️ Inferencing
The inferencing instruction is in sample_protein.py.
python sample_protein.py --model DiT-pro-12-h6-L16 --cuda-num cuda:0 --num 500
✏️ Citation
If you find our paper and code useful in your research, please consider giving a star :star: and citation :pencil:.
@article{zongying2024taxdiff,
title={TaxDiff: Taxonomic-Guided Diffusion Model for Protein Sequence Generation},
author={Zongying, Lin and Hao, Li and Liuzhenghao, Lv and Bin, Lin and Junwu, Zhang and Yu-Chian, Chen Calvin and Li, Yuan and Yonghong, Tian},
journal={arXiv preprint arXiv:2402.17156},
year={2024}
}