PLMSearch
PLMSearch copied to clipboard
PLMSearch enables accurate and fast homologous protein search with only sequences as input
PLMSearch
- 2024.6.5 Update: We have uploaded the
Dataset of PLMSearch & PLMAlignin Zenodo. - 2024.5.30 Update: We have uploaded the
Dataset of PLMSearch Web Serverin Zenodo.
This is the implement of "PLMSearch: Protein language model powers accurate and fast sequence search for remote homology". By using a protein language model, PLMSearch can achieve a sensitivity close to SOAT structure search methods while being versatile and fast because it is only based on sequences.

Quick links
- Webserver
- Requirements
- Data preparation
- Reproduce all our experiments with only one file
- Run PLMSearch locally
- Citation
Webserver
PLMSearch web server : dmiip.sjtu.edu.cn/PLMSearch 🚀
PLMAlign web server : dmiip.sjtu.edu.cn/PLMAlign :airplane:
PLMAlign source code : github.com/maovshao/PLMAlign :helicopter:
Requirements
Follow the steps in requirements.sh
Data preparation
We have released our experiment data, which can be downloaded from plmsearch_data or Zenodo.
# Include experiment data, PLMSearch model, ESM-1b model, etc.
# Use the following command or download it from https://zenodo.org/records/11480660
wget https://dmiip.sjtu.edu.cn/PLMSearch/static/download/plmsearch_data.tar.gz
tar zxvf plmsearch_data.tar.gz
Reproduce all our experiments with only one file
- Reproduce all our experiments with good visualization by following the steps in main.ipynb
Notice: Detailed results are saved in scientist_figures/.
Run PLMSearch locally
- Run PLMSearch locally by following the example in pipeline.ipynb
Notice: the inputs and outputs of the example are saved in example/.
Citation
Liu, W., Wang, Z., You, R. et al. PLMSearch: Protein language model powers accurate and fast sequence search for remote homology. Nat Commun 15, 2775 (2024). https://doi.org/10.1038/s41467-024-46808-5