ProSST icon indicating copy to clipboard operation
ProSST copied to clipboard

Thank you for the work you have done. This question pertains to downstream tasks.

Open lllastronaut opened this issue 1 year ago • 5 comments

The dimension of the vector representations generated by your model is determined by the number of residues in the protein. Have you compared which method for unifying vector lengths works best in downstream tasks, such as padding, embedding to a fixed dimension, or other methods?

lllastronaut avatar Sep 05 '24 03:09 lllastronaut

Mean Pooling and Attention Pooling both work well in our test. (https://huggingface.co/AI4Protein/ProSST-2048/blob/main/modeling_prosst.py#L221)

mingchen-li avatar Sep 05 '24 03:09 mingchen-li

How do you extract the embedding vector? Thank you very much

Fakerwws avatar Jan 06 '25 02:01 Fakerwws

How do you extract the embedding vector? Thank you very much

Extract embeddings at the output of a specific layer of the pre-trained model.

lllastronaut avatar Jan 10 '25 03:01 lllastronaut

How do you extract the embedding vector? Thank you very much

Extract embeddings at the output of a specific layer of the pre-trained model.

I am currently extracting the embedding vector of [n,768] through the last hidden layer.

Fakerwws avatar Jan 10 '25 05:01 Fakerwws

How do you extract the embedding vector? Thank you very much

Extract embeddings at the output of a specific layer of the pre-trained model.

I am currently extracting the embedding vector of [n,768] through the last hidden layer.

How did you solve this problem? structure_sequence_offset = [i + 3 for i in structure_sequence] TypeError: can only concatenate str (not "int") to str

MichaelAbel1 avatar Mar 04 '25 14:03 MichaelAbel1