ProSST icon indicating copy to clipboard operation
ProSST copied to clipboard

Structural tokenizer (`PdbQuantizer`) is too slow at processing long proteins

Open KatarinaYuan opened this issue 1 year ago • 9 comments

Hi Teams, Thanks for the great work. Wondering how long it takes to process a protein of length 400 for your pre-trained PdbQuantizer? On my machine, somehow it's super slow. Just trying to figure out the reason.

Thanks for help!

KatarinaYuan avatar Aug 15 '24 18:08 KatarinaYuan

Hi, thanks for concerning our work. We will release an accelerated PdbQuantizer with multi-thread parallel processing next month.

mingchen-li avatar Aug 16 '24 06:08 mingchen-li

Accelerated version speed:

Protein Name (Uniprot_ID) Length (Local structures) Splitting to local structure Encoding
CCDB_ECOLI_Adkar_2012 101 0.29s 4.43s
ESTA_BACSU_Nutschel_2020 212 0.67s 4.27s
PTEN_HUMAN_Matreyek_2021 403 1.06s 4.45s
ENV_HV1B9_DuenasDecamp_2016 853 3.24s 5.63s

mingchen-li avatar Aug 16 '24 06:08 mingchen-li

Thanks! Looking forward to the release!

KatarinaYuan avatar Aug 30 '24 17:08 KatarinaYuan

Hello, when will it be released?

SIITW avatar Oct 17 '24 08:10 SIITW

Hi, please check the new quantizer.py. (Make sure that you have installed pathos in your python env.[pip install pathos])

mingchen-li avatar Oct 17 '24 11:10 mingchen-li

Still too slow on my side. Could you tell us how to use the new quantizer.py. I just use it as suggested:

from prosst.structure.quantizer import PdbQuantizer
processor = PdbQuantizer(structure_vocab_size=2048) # can be 20, 128, 512, 1024, 2048, 4096
result = processor("example_data/p1.pdb", return_residue_seq=False)

GGchen1997 avatar Dec 20 '24 16:12 GGchen1997

Still too slow on my side. Could you tell us how to use the new quantizer.py. I just use it as suggested:

from prosst.structure.quantizer import PdbQuantizer
processor = PdbQuantizer(structure_vocab_size=2048) # can be 20, 128, 512, 1024, 2048, 4096
result = processor("example_data/p1.pdb", return_residue_seq=False)

I want to know,too! It's super slow. :(

zz-lovely avatar Dec 23 '24 02:12 zz-lovely

it take me about 2 hour to process 30 proteins complex, which 400AA, how can i accelate this progress?

Linzy19 avatar Jan 15 '25 11:01 Linzy19

We're excited to share that we've just merged a significant optimization contributed by mdanzi. Which took a batch of 100 proteins from running in about 7 hours to running in about 80 seconds. thanks again to mdanzi for the excellent contribution!

Tpan1039-ui avatar Feb 26 '25 12:02 Tpan1039-ui