Structural tokenizer (`PdbQuantizer`) is too slow at processing long proteins
Hi Teams,
Thanks for the great work. Wondering how long it takes to process a protein of length 400 for your pre-trained PdbQuantizer? On my machine, somehow it's super slow. Just trying to figure out the reason.
Thanks for help!
Hi, thanks for concerning our work. We will release an accelerated PdbQuantizer with multi-thread parallel processing next month.
Accelerated version speed:
| Protein Name (Uniprot_ID) | Length (Local structures) | Splitting to local structure | Encoding |
|---|---|---|---|
| CCDB_ECOLI_Adkar_2012 | 101 | 0.29s | 4.43s |
| ESTA_BACSU_Nutschel_2020 | 212 | 0.67s | 4.27s |
| PTEN_HUMAN_Matreyek_2021 | 403 | 1.06s | 4.45s |
| ENV_HV1B9_DuenasDecamp_2016 | 853 | 3.24s | 5.63s |
Thanks! Looking forward to the release!
Hello, when will it be released?
Hi, please check the new quantizer.py. (Make sure that you have installed pathos in your python env.[pip install pathos])
Still too slow on my side. Could you tell us how to use the new quantizer.py. I just use it as suggested:
from prosst.structure.quantizer import PdbQuantizer
processor = PdbQuantizer(structure_vocab_size=2048) # can be 20, 128, 512, 1024, 2048, 4096
result = processor("example_data/p1.pdb", return_residue_seq=False)
Still too slow on my side. Could you tell us how to use the new quantizer.py. I just use it as suggested:
from prosst.structure.quantizer import PdbQuantizer processor = PdbQuantizer(structure_vocab_size=2048) # can be 20, 128, 512, 1024, 2048, 4096 result = processor("example_data/p1.pdb", return_residue_seq=False)
I want to know,too! It's super slow. :(
it take me about 2 hour to process 30 proteins complex, which 400AA, how can i accelate this progress?
We're excited to share that we've just merged a significant optimization contributed by mdanzi. Which took a batch of 100 proteins from running in about 7 hours to running in about 80 seconds. thanks again to mdanzi for the excellent contribution!