esm
esm copied to clipboard
How to generate embeddings for very long protein using EMSC
Dear ESMC Developer,
I am attempting to generate embeddings for a set of proteins, including a particularly large protein (>34,000 residues) — ENSMUSP00000097561.4 (mouse heart muscle gene titin). However, I am encountering a CUDA memory limitation error when processing this sequence.
I am using the following code snippet:
EMBEDDING_CONFIG = LogitsConfig(
sequence=True, return_embeddings=True, return_hidden_states=True
)
def embed_sequence(model: ESM3InferenceClient, sequence: str) -> LogitsOutput:
protein = ESMProtein(sequence=sequence)
protein_tensor = model.encode(protein)
output = model.logits(protein_tensor, EMBEDDING_CONFIG)
return output
Could you please suggest an approach to handle these types of long proteins?
Thank you for your time and assistance.
Best regards, Soumitra