papers_for_protein_design_using_DL
papers_for_protein_design_using_DL copied to clipboard
List of papers about Proteins Design using Deep Learning
List of papers about Proteins Design using Deep Learning
About this repository
Inspired by Kevin Kaichuang Yang's Machine-learning-for-proteins. In terms of the fast development of protein design in DL, we started making this dynamic repository as a record of latest papers/projects in this field for the newcomers like us:
- Mini protein, binders, metalloprotein, antibody, peptide & molecule designs are included.
- More de novo protein design paper list at Wangchentong's GitHub repo: paper_for_denovo_protein_design.
- Our notes of these papers are shared in a Zhihu Column (simplified Chinese/English), more suggested notes at RosettAI.
Contributions are welcome!
Menu
Heading [2] follows a "generator-predictor-optimizer" paradigm, Heading [3], [4]&[6] follow "Inside-out" paradigm(function-scaffold-sequence) from RosettaCommons, Heading [5]&[7] follow other ML/DL strategies.
-
List of papers about Proteins Design using Deep Learning
- About this repository
- Menu
-
0. Benchmarks and datasets
- 0.1 Function to sequence
- 0.2 Structure to sequence
-
0.3 Others
- 0.3.1 Sequence Database
- 0.3.2 Structure Database
- 0.3.3 Protein Structure Datasets
-
1. Reviews
- 1.1 De novo protein design
- 1.2 Antibody design
- 1.3 Peptide design
- 1.4 Binder design
-
2. Model-based design
- 2.1 trRosetta-based
- 2.2 AlphaFold2-based
- 2.3 DMPfold2-based
- 2.4 CM-Align
- 2.5 MSA-transformer-based
- 2.6 DeepAb-based
- 2.7 TRFold2-based
-
3. Function to Scaffold
- 3.1 GAN-based
- 3.2 VAE-based
- 3.3 DAE-based
- 3.4 MLP-based
- 3.5 Diffusion-based
- 3.6 Score-based
-
4.Scaffold to Sequence
- 4.1 MLP-based
- 4.2 VAE-based
- 4.3 LSTM-based
- 4.4 CNN-based
- 4.5 GNN-based
- 4.6 GAN-based
- 4.7 Transformer-based
- 4.8 ResNet-based
- 4.9 Diffusion-based
-
5.Function to Sequence
- 5.1 CNN-based
- 5.2 VAE-based
- 5.3 GAN-based
- 5.4 Transformer-based
- 5.5 ResNet-based
- 5.6 Bayesian-based
- 5.7 RL-based
- 5.8 Flow-based
- 5.9 RNN-based
- 5.10 LSTM-based
-
6. Function to Structure
- 6.1 LSTM-based
- 6.2 Diffusion-based
- 6.3 RoseTTAFold-based
- 6.4 Masif-based
-
7. Other tasks
- 7.1 Effects of mutation & Fitness Landscape
- 7.2 Protein Language Models (PTM) and representation learning
-
7.3 Molecular Design Models
- 7.3.1 Gradient optimization
- 7.3.2 Optimized sampling
0. Benchmarks and datasets
0.1 Function to sequence
FLIP: Benchmark tasks in fitness landscape inference for proteins
Christian Dallago, Jody Mou, Kadina E Johnston, Bruce Wittmann, Nick Bhattacharya, Samuel Goldman, Ali Madani, Kevin K Yang
NeurIPS 2021 Datasets and Benchmarks Track || website
0.2 Structure to sequence
AlphaDesign: A graph protein design method and benchmark on AlphaFoldDB
Zhangyang Gao, Cheng Tan, Stan Z. Li
arxiv (2022)
0.3 Others
A list of suggested protein databases, more lists at CNCB.
0.3.1 Sequence Database
0.3.2 Structure Database
0.3.3 Protein Structure Datasets
SidechainNet: An All-Atom Protein Structure Dataset for Machine Learning
Jonathan E. King, David Ryan Koes
arxiv || github::sidechainnet
TDC maintains a resource list that currently contains 22 tasks (and its datasets) related to small molecules and macromolecules, including PPI, DDI and so on. MoleculeNet published a small molecule related benchmark four years ago.
In terms of datasets and benchmarks, protein design is far less mature than drug discovery (paperwithcode drug discovery benchmarks). (Maybe should add the evaluation of protein design for deep learning method (especially deep generative model))
Difficulties and opportunities always coexist. Happy to see the work of Christian Dallago, Jody Mou, Kadina E. Johnston, Bruce J. Wittmann, Nicholas Bhattacharya, Samuel Goldman, Ali Madani, Kevin K. Yang and Zhangyang Gao, Cheng Tan, Stan Z. Li.
1. Reviews
1.1 De novo protein design
Deep learning in protein structural modeling and design
Wenhao Gao, Sai Pooja Mahajan, Jeremias Sulam, and Jeffrey J. Gray
Patterns 1.9 || 2020
Protein sequence design with deep generative models
Zachary Wu, Kadina E. Johnston, Frances H. Arnold, Kevin K. Yang
Current Opinion in Chemical Biology || note || 2021
Structure-based protein design with deep learning
Ovchinnikov, Sergey, and Po-Ssu Huang.
Current opinion in chemical biology || note || 2021
Protein design via deep learning
Wenze Ding, Kenta Nakai, Haipeng Gong
Briefings in Bioinformatics || 25 March 2022
Deep generative modeling for protein design
Strokach, Alexey, and Philip M. Kim.
Current Opinion in Structural Biology || 2022
1.2 Antibody design
A review of deep learning methods for antibodies
Graves, Jordan, et al.
Antibodies 9.2 (2020)
1.3 Peptide design
Deep generative models for peptide design
Wan, Fangping, Daphne Kontogiorgos-Heintz, and Cesar de la Fuente-Nunez
Digital Discovery (2022)
1.4 Binder design
Improving de novo Protein Binder Design with Deep Learning
Nathaniel Bennett, Brian Coventry, Inna Goreshnik, Buwei Huang, Aza Allen, Dionne Vafeados, Ying Po Peng, Justas Dauparas, Minkyung Baek, Lance Stewart, Frank DiMaio, Steven De Munck, Savvas Savvides, David Baker
bioRxiv 2022.06.15.495993
2. Model-based design
Invert trained models with optimize algorithms through iterations for sequence design. Inverted structure prediction models are known as Hallucination.
2.1 trRosetta-based
Design of proteins presenting discontinuous functional sites using deep learning
Doug Tischer, Sidney Lisanza, Jue Wang, Runze Dong, View ORCID ProfileIvan Anishchenko, Lukas F. Milles, Sergey Ovchinnikov, David Baker
bioRxiv (2020)
Fast differentiable DNA and protein sequence optimization for molecular design
Linder, Johannes, and Georg Seelig.
arXiv preprint arXiv:2005.11275 (2020)
De novo protein design by deep network hallucination
Ivan Anishchenko, Samuel J. Pellock, Tamuka M. Chidyausiku, Theresa A. Ramelot, Sergey Ovchinnikov, Jingzhou Hao, Khushboo Bafna, Christoffer Norn, Alex Kang, Asim K. Bera, Frank DiMaio, Lauren Carter, Cameron M. Chow, Gaetano T. Montelione & David Baker
Nature (2021) || code || trRosetta
Protein sequence design by conformational landscape optimization
Norn, Christoffer, et al.
Proceedings of the National Academy of Sciences 118.11 (2021) || code
2.2 AlphaFold2-based
Solubility-aware protein binding peptide design using AlphaFold
Takatsugu Kosugi, Masahito Ohue
bioRxiv 2022.05.14.491955 || Supplemental Materials
End-to-end learning of multiple sequence alignments with differentiable Smith-Waterman
Petti, Samantha, Bhattacharya, Nicholas, Rao, Roshan, Dauparas, Justas, Thomas, Neil, Zhou, Juannan, Rush, Alexander M, Koo, Peter K, Ovchinnikov, Sergey
bioRxiv (2021) || ColabDesign, SMURF, AF2 back propagation || our notes1, notes2 || lecture || Discord
AlphaDesign: A de novo protein design framework based on AlphaFold
Jendrusch, Michael, Jan O. Korbel, and S. Kashif Sadiq.
bioRxiv (2021)
Using AlphaFold for Rapid and Accurate Fixed Backbone Protein Design
Moffat, Lewis, Joe G. Greener, and David T. Jones.
bioRxiv (2021)
Hallucinating protein assemblies
Basile I M Wicky, Lukas F Milles, Alexis Courbet, Robert J Ragotte, Justas Dauparas, Elias Kinfu, Sam Tipps, Ryan D Kibler, Minkyung Baek, Frank DiMaio, Xinting Li, Lauren Carter, Alex Kang, Hannah Nguyen, Asim K Bera, David Baker
bioRxiv 2022.06.09.493773 || related slides || our notes
EvoBind: in silico directed evolution of peptide binders with AlphaFold
Patrick Bryant, Arne Elofsson
bioRxiv 2022.07.23.501214 || code
2.3 DMPfold2-based
Design in the DARK: Learning Deep Generative Models for De Novo Protein Design
Moffat, Lewis, Shaun M. Kandathil, and David T. Jones.
bioRxiv (2022) || DMPfold2
2.4 CM-Align
AutoFoldFinder: An Automated Adaptive Optimization Toolkit for De Novo Protein Fold Design
Shuhao Zhang, Youjun Xu, Jianfeng Pei, Luhua Lai
NeurIPS 2021
2.5 MSA-transformer-based
Protein language models trained on multiple sequence alignments learn phylogenetic relationships
Lupo, Umberto, Damiano Sgarbossa, and Anne-Florence Bitbol.
arXiv preprint arXiv:2203.15465 (2022)
2.6 DeepAb-based
Towards deep learning models for target-specific antibody design
Mahajan, Sai Pooja, et al.
Biophysical Journal 121.3 (2022) || DeepAb || lecture
Hallucinating structure-conditioned antibody libraries for target-specific binders
Sai Pooja Mahajan, Jeffrey A Ruffolo, Rahel Frick, Jeffrey J. Gray
bioRxiv 2022.06.06.494991 || Supplymentary
2.7 TRFold2-based
TRDesign
TIANRANG XLab
Unpublished yet (June 2022) || code unavailable
3. Function to Scaffold
These models design backbone/scaffold/template.
3.1 GAN-based
Conditioning by adaptive sampling for robust design
Brookes, David, Hahnbeom Park, and Jennifer Listgarten.
International conference on machine learning. PMLR, 2019 || without code
Fully differentiable full-atom protein backbone generation
Anand Namrata, Raphael Eguchi, and Po-Ssu Huang.
OpenReview ICLR 2019 workshop DeepGenStruct || without code
RamaNet: Computational de novo helical protein backbone design using a long short-term memory generative neural network
Sabban, Sari, and Mikhail Markovsky.
F1000Research 9 (2020) || code || pyRosetta || tensorflow || maximizaing the fluorescence of a protein
3.2 VAE-based
IG-VAE: generative modeling of immunoglobulin proteins by direct 3D coordinate generation
Raphael R. Eguchi, Christian A. Choe, Po-Ssu Huang
Biorxiv (2020) || without code ||
Deep sharpening of topological features for de novo protein design
Harteveld, Zander, et al.
ICLR2022 Machine Learning for Drug Discovery. 2022
End-to-End deep structure generative model for protein design
Boqiao Lai, matthew McPartlon, Jinbo Xu
bioRxiv 2022.07.09.499440
3.3 DAE-based
Function-guided protein design by deep manifold sampling
Vladimir Gligorijevic, Stephen Ra, Daniel Berenberg, Richard Bonneau, Kyunghyun Cho
NeurIPS 2021 || without code
3.4 MLP-based
A backbone-centred energy function of neural networks for protein design
Huang, B., Xu, Y., Hu, X. et al
Nature (2022)
3.5 Diffusion-based
Diffusion probabilistic modeling of protein backbones in 3D for the motif-scaffolding problem
Brian L. Trippe, Jason Yim, Doug Tischer, Tamara Broderick, David Baker, Regina Barzilay, Tommi Jaakkola
arXiv:2206.04119
3.6 Score-based
ProteinSGM: Score-based generative modeling for de novo protein design
Jin Sub Lee, Philip M Kim
bioRxiv 2022.07.13.499967
4.Scaffold to Sequence
Identify amino sequence from given backbone/scaffold/template constrains: torsion angles(φ & ψ), backbone angles(θ and τ), backbone dihedrals (φ, ψ & ω), backbone atoms (Cα, N, C, & O), Cα − Cα distance, unit direction vectors of Cα−Cα, Cα−N & Cα−C, etc(aka. inverse folding). Referred from here.
4.1 MLP-based
3D representations of amino acids—applications to protein sequence comparison and classification
Li, Jie, and Patrice Koehl.
Computational and structural biotechnology journal 11.18 (2014) || 2014
Direct prediction of profiles of sequences compatible with a protein structure by neural networks with fragment‐based local and energy‐based nonlocal profiles
Li, Zhixiu, et al.
Proteins: Structure, Function, and Bioinformatics 82.10 (2014) || code unavailable
SPIN2: Predicting sequence profiles from protein structures using deep neural networks
O'Connell, James, et al.
Proteins: Structure, Function, and Bioinformatics 86.6 (2018) || code unavailable
Computational protein design with deep learning neural networks
Wang, Jingxue, et al.
Scientific reports 8.1 (2018) || code unavailable
4.2 VAE-based
Design of metalloproteins and novel protein folds using variational autoencoders
Greener, Joe G., Lewis Moffat, and David T. Jones.
Scientific reports 8.1 (2018)
4.3 LSTM-based
To improve protein sequence profile prediction through image captioning on pairwise residue distance map
Chen, Sheng, et al.
Journal of chemical information and modeling 60.1 (2019) || SPROF
Deep learning of Protein Sequence Design of Protein-protein Interactions
Syrlybaeva, Raulia, and Eva-Maria Strauch.
bioRxiv (2022) || Supplymentary || code
4.4 CNN-based
A structure-based deep learning framework for protein engineering
Shroff, Raghav, et al.
bioRxiv (2019)
ProDCoNN: Protein design using a convolutional neural network
Zhang, Yuan, et al.
Proteins: Structure, Function, and Bioinformatics 88.7 (2020) || code unavailable
Protein sequence design with a learned potential
Namrata Anand, Raphael Eguchi, Irimpan I. Mathews, Carla P. Perez, Alexander Derry, Russ B. Altman & Po-Ssu Huang
Nacture Communications (2022) || code
Protein Sequence Design with Deep Learning and Tooling like Monte Carlo Sampling and Analysis
Leonardo Castorina
paper not available || code
4.5 GNN-based
Learning from protein structure with geometric vector perceptrons
Jing, Bowen, et al.
arXiv preprint arXiv:2009.01411 (2020) || GVP
Fast and flexible protein design using deep graph neural networks
Alexey Strokach, David Becerra, Carles Corbi-Verge, Albert Perez-Riba, Philip M. Kim
Cell Systems (2020) || code::ProteinSolver
TERMinator: A Neural Framework for Structure-Based Protein Design using Tertiary Repeating Motifs
Li, Alex J., et al.
NeurIPS 2021 / arXiv (2022)
Iterative refinement graph neural network for antibody sequence-structure co-design
Jin, Wengong, et al.
arXiv preprint arXiv:2110.04624 (2021) || RefineGNN || lecture1, lecture2
A neural network model for prediction of amino-acid probability from a protein backbone structure
Koya Sakuma, Naoya Kobayashi
Unpublished yet (June 2021)|| GCNdesgin
XENet: Using a new graph convolution to accelerate the timeline for protein design on quantum computers
Maguire, Jack B., et al.
PLoS computational biology 17.9 (2021)
AlphaDesign: A graph protein design method and benchmark on AlphaFoldDB
Gao, Zhangyang, Cheng Tan, and Stan Li.
arXiv preprint arXiv:2202.01079 (2022) || code
Generative De Novo Protein Design with Global Context
Cheng Tan, Zhangyao Gao, Jun Xia and Stan Z. Li
arXiv || Apr 2022 || code
Masked inverse folding with sequence transfer for protein representation learning
Kevin K Yang, Hugh Yeh, Niccolò Zanichelli
bioRxiv 2022.05.25.493516 || code || model
Robust deep learning based protein sequence design using ProteinMPNN
Justas Dauparas, Ivan Anishchenko, Nathaniel Bennett, Hua Bai, Robert J. Ragotte, Lukas F. Milles, Basile I. M. Wicky, Alexis Courbet, Robbert J. de Haas, Neville Bethel, Philip J. Y. Leung, Timothy F. Huddy, Sam Pellock, Doug Tischer, Frederick Chan, Brian Koepnick, Hannah Nguyen, Alex Kang, Banumathi Sankaran, Asim Bera, Neil P. King, David Baker
bioRxiv 2022.06.03.494563/ || code || hugging face
Neural Network-Derived Potts Models for Structure-Based Protein Design using Backbone Atomic Coordinates and Tertiary Motifs
Alex J. Li, Mindren Lu, Israel Desta, Vikram Sundar, Gevorg Grigoryan, and Amy E. Keating
bioRxiv 2022.08.02.501736
Conditional Antibody Design as 3D Equivariant Graph Translation
Xiangzhe Kong, Wenbing Huang, Yang Liu
arXiv:2208.06073
4.6 GAN-based
De novo protein design for novel folds using guided conditional Wasserstein generative adversarial networks
Mostafa Karimi, Shaowen Zhu, Yue Cao, Yang Shen
Journal of chemical information and modeling 60.12 (2020) || gcWGAN
HelixGAN: A bidirectional Generative Adversarial Network with search in latent space for generation under constraints
Xuezhi Xie, Philip M. Kim
Machine Learning for Structural Biology Workshop, NeurIPS 2021 || without code
4.7 Transformer-based
Generative models for graph-based protein design
John Ingraham, Vikas K Garg, Dr.Regina Barzilay, Tommi Jaakkola
NeurIPS 2019 || GraphTrans
Fold2Seq: A Joint Sequence (1D)-Fold (3D) Embedding-based Generative Model for Protein Design
Cao, Yue, et al.
International Conference on Machine Learning. PMLR, 2021
Rotamer-Free Protein Sequence Design Based on Deep Learning and Self-Consistency
Liu, Yufeng, et al.
Nature portfolio (2022)/Nature computational science(2022) || Supplymentary || Comment
A Deep SE(3)-Equivariant Model for Learning Inverse Protein Folding
Mmatthew McPartlon, Ben Lai, Jinbo Xu
bioRxiv (2022)
Learning inverse folding from millions of predicted structures
Chloe Hsu, Robert Verkuil, Jason Liu, Zeming Lin, Brian Hie, Tom Sercu, Adam Lerer, Alexander Rives
bioRxiv (2022) || esm
Accurate and efficient protein sequence design through learning concise local environment of residues
Huang, Bin, et al.
bioRxiv (2022) || Supplymentary
PeTriBERT : Augmenting BERT with tridimensional encoding for inverse protein folding and design
Baldwin Dumortier, Antoine Liutkus, Clément Carré, Gabriel Krouk
bioRxiv 2022.08.10.503344
4.8 ResNet-based
DenseCPD: improving the accuracy of neural-network-based computational protein sequence design with DenseNet
Qi, Yifei, and John ZH Zhang.
Journal of chemical information and modeling 60.3 (2020) || code unavailable
4.9 Diffusion-based
Antigen-Specific Antibody Design and Optimization with Diffusion-Based Generative Models
Shitong Luo, Yufeng Su, Xingang Peng, Sheng Wang, Jian Peng, Jianzhu Ma
bioRxiv 2022.07.10.499510
5.Function to Sequence
These models generate sequences from expected function.
5.1 CNN-based
Protein design and variant prediction using autoregressive generative models
Shin, Jung-Eun, et al.
Nature communications 12.1 (2021) || code::SeqDesign || mutation effect prediction || sequence generation || April 2021
5.2 VAE-based
Variational auto-encoding of protein sequences
Sinai, Sam, et al.
arXiv preprint arXiv:1712.03346 (2017)
Pepcvae: Semi-supervised targeted design of antimicrobial peptide sequences
Das, Payel, et al.
arXiv preprint arXiv:1810.07743 (2018)
Deep generative models for T cell receptor protein sequences
Davidsen, Kristian, et al.
Elife 8 (2019)
How to hallucinate functional proteins
Costello, Zak, and Hector Garcia Martin.
arXiv preprint arXiv:1903.00458 (2019)
Variational autoencoder for generation of antimicrobial peptides
Dean, Scott N., and Scott A. Walper.
ACS omega 5.33 (2020)
Generating functional protein variants with variational autoencoders
Hawkins-Hooker, Alex, et al.
PLoS computational biology 17.2 (2021)
Accelerated antimicrobial discovery via deep generative models and molecular dynamics simulations
Das, Payel, et al.
Nature Biomedical Engineering 5.6 (2021)
Deep generative models create new and diverse protein structures
Zeming, Tom, Yann and Alexander.
NeurIPS 2021
Therapeutic enzyme engineering using a generative neural network
Giessel, Andrew, et al.
Scientific Reports 12.1 (2022)
GM-Pep: A High Efficiency Strategy to De Novo Design Functional Peptide Sequences
Chen, Qushuo, et al.
Journal of Chemical Information and Modeling (2022) || code
5.3 GAN-based
Generative modeling for protein structures
Anand, Namrata, and Possu Huang.
NeurIPS 2018
Generating protein sequences from antibiotic resistance genes data using Generative Adversarial Networks
Chhibbar, Prabal, and Arpit Joshi.
arXiv preprint arXiv:1904.13240 (2019)
ProGAN: Protein solubility generative adversarial nets for data augmentation in DNN framework
Han, Xi, et al.
Computers & Chemical Engineering 131 (2019)
GANDALF: Peptide Generation for Drug Design using Sequential and Structural Generative Adversarial Networks
Rossetto, Allison, and Wenjin Zhou.
Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics. 2020
Generating ampicillin-level antimicrobial peptides with activity-aware generative adversarial networks
Tucs, Andrejs, et al.
ACS omega 5.36 (2020)
Conditional Generative Modeling for De Novo Protein Design with Hierarchical Functions
Kucera, Tim, Matteo Togninalli, and Laetitia Meng-Papaxanthos
bioRxiv (2021)/Bioinformatics 38.13 (2022) || code
Expanding functional protein sequence spaces using generative adversarial networks
Repecka, Donatas, et al.
Nature Machine Intelligence 3.4 (2021)
A Generative Approach toward Precision Antimicrobial Peptide Design.
Ferrell, Jonathon B., et al.
BioRxiv (2021)
AMPGAN v2: Machine Learning-Guided Design of Antimicrobial Peptides
Van Oort, Colin M., et al.
Journal of chemical information and modeling 61.5 (2021)
DeepImmuno: deep learning-empowered prediction and generation of immunogenic peptides for T-cell immunity
Li, Guangyuan, et al.
Briefings in bioinformatics 22.6 (2021)
PandoraGAN: Generating antiviral peptides using Generative Adversarial Network
Surana, Shraddha, et al.
bioRxiv (2021)
5.4 Transformer-based
Progen: Language modeling for protein generation
Madani, Ali, et al.
arXiv preprint arXiv:2004.03497 (2020)
Signal peptides generated by attention-based neural networks
Wu, Zachary, et al.
ACS Synthetic Biology 9.8 (2020)
Generative Language Modeling for Antibody Design
Shuai, Richard W., Jeffrey A. Ruffolo, and Jeffrey J. Gray.
bioRxiv (2021)
Deep neural language modeling enables functional protein generation across families
Madani, Ali, et al.
bioRxiv (2021)
BioPhi: A platform for antibody design, humanization, and humanness evaluation based on natural antibody repertoires and deep learning
Prihoda, David, et al.
mAbs. Vol. 14. No. 1. Taylor & Francis, 2022
Guided Generative Protein Design using Regularized Transformers
Castro, Egbert, et al.
arXiv preprint arXiv:2201.09948 (2022)
ProtGPT2 is a deep unsupervised language model for protein design
Noelia Ferruz, View ProfileSteffen Schmidt, View ProfileBirte Höcker
bioRxiv/Nature Communications || model::huggingface datasets::hugingface || lecture
Few Shot Protein Generation
Ram, Soumya, and Tristan Bepler.
arXiv preprint arXiv:2204.01168 (2022)
Towards Controllable Protein design with Conditional Transformers
Ferruz Noelia, and Birte Höcker.
arXiv preprint arXiv:2201.07338 (2022)/Nature Machine Intelligence (2022) || review of Heading 5.4
ProGen2: Exploring the Boundaries of Protein Language Models
Erik Nijkamp, Jeffrey Ruffolo, Eli N. Weinstein, Nikhil Naik, Ali Madani
arXiv:2206.13517 || code
AbBERT: Learning Antibody Humanness via Masked Language Modeling
Denis Vashchenko, Sam Nguyen, Andre Goncalves, Felipe Leno da Silva, Brenden Petersen, Thomas Desautels, Daniel Faissol
bioRxiv 2022.08.02.502236
5.5 ResNet-based
Accelerating protein design using autoregressive generative models
Riesselman, Adam, et al.
BioRxiv (2019)
5.6 Bayesian-based
Discovering de novo peptide substrates for enzymes using machine learning
Tallorin, Lorillee, et al.
Nature communications 9.1 (2018) || code
Now What Sequence? Pre-trained Ensembles for Bayesian Optimization of Protein Sequences
Ziyue Yang, Katarina A Milas, Andrew D White
bioRxiv 2022.08.05.502972 || code || Supplymentary
Lattice protein design using Bayesian learning
Takahashi, Tomoei, George Chikenji, and Kei Tokita.
arXiv:2003.06601/Physical Review E 104.1 (2021): 014404
AntBO: Towards Real-World Automated Antibody Design with Combinatorial Bayesian Optimisation
Khan, Asif, et al.
arXiv preprint (2022)
Statistical Mechanics of Protein Design
Takahashi, Tomoei, George Chikenji, and Kei Tokita.
arXiv preprint arXiv:2205.03696 (2022)
5.7 RL-based
Model-based reinforcement learning for biological sequence design
Angermueller, Christof, et al.
International conference on learning representations. 2019
5.8 Flow-based
Biological Sequence Design with GFlowNets
Jain, Moksh, et al.
arXiv preprint arXiv:2203.04115 (2022) || lecture
5.9 RNN-based
Deep learning to design nuclear-targeting abiotic miniproteins
Schissel, Carly K., et al.
Nature Chemistry 13.10 (2021) || code
Recurrent neural network model for constructive peptide design
Müller, Alex T., Jan A. Hiss, and Gisbert Schneider.
Journal of chemical information and modeling 58.2 (2018)
Machine learning designs non-hemolytic antimicrobial peptides
Capecchi, Alice, et al.
Chemical Science 12.26 (2021)
Using molecular dynamics simulations to prioritize and understand AI-generated cell penetrating peptides
Tran, Duy Phuoc, et al.
Scientific reports 11.1 (2021)
5.10 LSTM-based
Computational antimicrobial peptide design and evaluation against multidrug-resistant clinical isolates of bacteria
Nagarajan, Deepesh, et al
Journal of Biological Chemistry 293.10 (2018)
Deep learning enables the design of functional de novo antimicrobial proteins
Caceres-Delpiano, Javier, et al.
bioRxiv (2020)
ECNet is an evolutionary context-integrated deep learning framework for protein engineering
Luo, Yunan, et al.
Nature communications 12.1 (2021)
Deep learning for novel antimicrobial peptide design
Wang, Christina, Sam Garlick, and Mire Zloh.
Biomolecules 11.3 (2021)
Deep learning to design nuclear-targeting abiotic miniproteins
Schissel, Carly K., et al.
Nature Chemistry 13.10 (2021)
In silico proof of principle of machine learning-based antibody design at unconstrained scale
Akbar, Rahmad, et al.
Mabs. Vol. 14. No. 1. Taylor & Francis, 2022 || code
6. Function to Structure
These models generate structures(including side chains) from expected function or recover a part of structures(aka. inpainting)
6.1 LSTM-based
One-sided design of protein-protein interaction motifs using deep learning
Syrlybaeva, Raulia, and Eva-Maria Strauch.
bioRxiv (2022) || code || our notes || lecture
6.2 Diffusion-based
Protein Structure and Sequence Generation with Equivariant Denoising Diffusion Probabilistic Models
Namrata Anand, Tudor Achim
GitHub (2022)/arXiv (2022) || our notes || lecture
6.3 RoseTTAFold-based
Deep learning methods for designing proteins scaffolding functional sites
Wang J, Lisanza S, Juergens D, Tischer D, Anishchenko I, Baek M, Watson JL, Chun JH, Milles LF, Dauparas J, Expòsit M, Yang W, Saragovi A, Ovchinnikov S, Baker D
bioRxiv(2021)/Science(2022) || RFDesign || our notes || lecture || RoseTTAFold || Supplymentary, Other supplymentary
6.4 Masif-based
De Novo Design of Site-specific Protein Binders Using Surface Fingerprints
Wehrle, Sarah, et al.
Protein Science 30.CONF (2021)/bioRxiv (2022) || supplymentary || masif_seed || masif
7. Other tasks
7.1 Effects of mutation & Fitness Landscape
Deep generative models of genetic variation capture the effects of mutations
Adam J. Riesselman, John B. Ingraham & Debora S. Marks
Nature Methods || code::DeepSequence || Oct 2018
Deciphering protein evolution and fitness landscapes with latent space models
Xinqiang Ding, Zhengting Zou & Charles L. Brooks III
Nature Communications || code::PEVAE || Dec 2019
The generative capacity of probabilistic protein sequence models
Francisco McGee, Sandro Hauri, Quentin Novinger, Slobodan Vucetic, Ronald M. Levy, Vincenzo Carnevale & Allan Haldane
Nature Communications || code::generation_capacity_metrics || code::sVAE || Nov 2021
Epistatic Net allows the sparse spectral regularization of deep neural networks for inferring fitness functions
Amirali Aghazadeh, Hunter Nisonoff, Orhan Ocal, David H. Brookes, Yijie Huang, O. Ozan Koyluoglu, Jennifer Listgarten & Kannan Ramchandran
Nature Communications || code || Sep 2021
Proximal Exploration for Model-guided Protein Sequence Design
Zhizhou Ren, Jiahan Li, Fan Ding, Yuan Zhou, Jianzhu Ma, Jian Peng
BioRxiv (2022)
Efficient evolution of human antibodies from general protein language models and sequence information alone
Hie, Brian L., et al.
bioRxiv (2022) || code
Tranception: protein fitness prediction with autoregressive transformers and inference-time retrieval
Notin, P., Dias, M., Frazer, J., Marchena-Hurtado, J., Gomez, A., Marks, D.S., Gal, Y.
ICML (2022)/arXiv:2205.13760 || code || hugging face
Protein engineering via Bayesian optimization-guided evolutionary algorithm and robotic experiments
Ruyun Hu, Lihao Fu, Yongcan Chen, Junyu Chen, Yu Qiao, Tong Si
bioRxiv 2022.08.11.503535
Antibody optimization enabled by artificial intelligence predictions of binding affinity and naturalness
Sharrol Bachas, Goran Rakocevic, David Spencer, Anand V. Sastry, Robel Haile, John M. Sutton, George Kasun, Andrew Stachyra, Jahir M. Gutierrez, Edriss Yassine, Borka Medjo, Vincent Blay, Christa Kohnert, Jennifer T. Stanton, Alexander Brown, Nebojsa Tijanic, Cailen McCloskey, Rebecca Viazzo, Rebecca Consbruck, Hayley Carter, Simon Levine, Shaheed Abdulhaqq, Jacob Shaul, Abigail B. Ventura, Randal S. Olson, Engin Yapici, Joshua Meier, Sean McClain, Matthew Weinstock, Gregory Hannum, Ariel Schwartz, Miles Gander, Roberto Spreafico
bioRxiv 2022.08.16.504181
7.2 Protein Language Models (PTM) and representation learning
Unified rational protein engineering with sequence-based deep representation learning
Alley, Ethan C., et al.
Nature methods 16.12 (2019)
Protein Structure Representation Learning by Geometric Pretraining
Zuobai Zhang, Minghao Xu, Arian Jamasb, Vijil Chenthamarakshan, Aurelie Lozano, Payel Das, Jian Tang
arXiv || Jan 2022
Evolutionary velocity with protein language models
Brian L. Hie, Kevin K. Yang, and Peter S. Kim
bioRxiv
Advancing protein language models with linguistics: a roadmap for improved interpretability
Mai Ha Vu, Rahmad Akbar, Philippe A. Robert, Bartlomiej Swiatczak, Victor Greiff, Geir Kjetil Sandve, Dag Trygve Truslew Haug
arXiv:2207.00982
7.3 Molecular Design Models
Unlike function-scaffold-sequence paradigm in protein design, major molecular design models based on paradigm form DL from 3 kinds of level: atom-based, fragment-based, reaction-based, and they can be categorized as Gradient optimization or Optimized sampling(gradient-free). Click here for detail review
In consideration of learning more various of generative models for design, these recommended latest models from Molecular Design might be helpful and even be able to be transplanted to protein design. More paper list at CondaPereira's GitHub repo: Essay_For_Molecular_Generation.
7.3.1 Gradient optimization
Inverse design of 3d molecular structures with conditional generative neural networks
Gebauer, Niklas WA, et al.
arXiv preprint arXiv:2109.04824 (2021) || code || Sept 21
Differentiable scaffolding tree for molecular optimization
Fu, T., Gao, W., Xiao, C., Yasonik, J., Coley, C. W., & Sun, J.
arXiv preprint arXiv:2109.10469 || code || Sept 21
LIMO: Latent Inceptionism for Targeted Molecule Generation
Eckmann, Peter, et al.
arXiv preprint arXiv:2206.09010 (2022) || code
Improving de novo molecular design with curriculum learning
Guo, Jeff, et al.
Nature Machine Intelligence (2022) || code
7.3.2 Optimized sampling
De novo drug design framework based on mathematical programming method and deep learning model
Yujing Zhao, Qilei Liu*, Xinyuan Wu, Lei Zhang, Jian Du*, Qingwei Meng.
AIChE Journal. 2022, e17748
Structure-based de novo drug design using 3D deep generative models
Li, Yibo, Jianfeng Pei, and Luhua Lai.
Chemical science 12.41 (2021)
A 3D Generative Model for Structure-Based Drug Design
Luo, Shitong, et al.
Advances in Neural Information Processing Systems 34 (2021)
CELLS: Cost-Effective Evolution in Latent Space for Goal-Directed Molecular Generation
Chen, Zhiyuan, et al.
arXiv preprint arXiv:2112.00905 (2021)
DrugEx v2: de novo design of drug molecules by Pareto-based multi-objective reinforcement learning in polypharmacology
Liu, Xuhan, et al.
Journal of cheminformatics 13.1 (2021) || DrugEx
Generating 3D Molecules for Target Protein Binding
Meng Liu, Youzhi Luo, Kanji Uchino, Koji Maruhashi, Shuiwang Ji
arxiv (2022) || GraphBP
Optimizing molecules using efficient queries from property evaluations
Hoffman, Samuel C., et al.
Nature Machine Intelligence 4.1 (2022)
Deep Evolutionary Learning for Molecular Design
K. Grantham, M. Mukaidaisi, H. K. Ooi, M. S. Ghaemi, A. Tchagang and Y. Li
IEEE Computational Intelligence Magazine, vol. 17, no. 2, pp. 14-28, May 2022
Fragment-Based Ligand Generation Guided by Geometric Deep Learning on Protein-Ligand Structure
Powers, Alexander, et al.
bioRxiv (2022)
Pocket2Mol: Efficient Molecular Sampling Based on 3D Protein Pockets
Peng, Xingang, et al.
arXiv preprint arXiv:2205.07249 (2022) || code