Thank you for the work you have done. This question pertains to downstream tasks.
The dimension of the vector representations generated by your model is determined by the number of residues in the protein. Have you compared which method for unifying vector lengths works best in downstream tasks, such as padding, embedding to a fixed dimension, or other methods?
Mean Pooling and Attention Pooling both work well in our test. (https://huggingface.co/AI4Protein/ProSST-2048/blob/main/modeling_prosst.py#L221)
How do you extract the embedding vector? Thank you very much
How do you extract the embedding vector? Thank you very much
Extract embeddings at the output of a specific layer of the pre-trained model.
How do you extract the embedding vector? Thank you very much
Extract embeddings at the output of a specific layer of the pre-trained model.
I am currently extracting the embedding vector of [n,768] through the last hidden layer.
How do you extract the embedding vector? Thank you very much
Extract embeddings at the output of a specific layer of the pre-trained model.
I am currently extracting the embedding vector of [n,768] through the last hidden layer.
How did you solve this problem? structure_sequence_offset = [i + 3 for i in structure_sequence] TypeError: can only concatenate str (not "int") to str