ProSST Thank you for the work you have done. This question pertains to downstream tasks.

The dimension of the vector representations generated by your model is determined by the number of residues in the protein. Have you compared which method for unifying vector lengths works best in downstream tasks, such as padding, embedding to a fixed dimension, or other methods?

Sep 05 '24 03:09 lllastronaut

Mean Pooling and Attention Pooling both work well in our test. (https://huggingface.co/AI4Protein/ProSST-2048/blob/main/modeling_prosst.py#L221)

Sep 05 '24 03:09 mingchen-li

How do you extract the embedding vector? Thank you very much

Jan 06 '25 02:01 Fakerwws

How do you extract the embedding vector? Thank you very much

Extract embeddings at the output of a specific layer of the pre-trained model.

Jan 10 '25 03:01 lllastronaut

How do you extract the embedding vector? Thank you very much

Extract embeddings at the output of a specific layer of the pre-trained model.

I am currently extracting the embedding vector of [n,768] through the last hidden layer.

Jan 10 '25 05:01 Fakerwws

How do you extract the embedding vector? Thank you very much

Extract embeddings at the output of a specific layer of the pre-trained model.

I am currently extracting the embedding vector of [n,768] through the last hidden layer.

How did you solve this problem？ structure_sequence_offset = [i + 3 for i in structure_sequence] TypeError: can only concatenate str (not "int") to str

Mar 04 '25 14:03 MichaelAbel1