esm icon indicating copy to clipboard operation
esm copied to clipboard

error getting sequence embeddings.

Open santule opened this issue 3 months ago • 4 comments

Hi, I am processing lots of sequences in a fasta file but esm3-small model failed on the sequence "EHVAATHKTGLDALAELTGAALNSVEKLSELQFQTVRASLEDSTEQGKRVFDARSLHELTALQSEVSQPTEKLVAYGRHLYQIAAGTHAEWRKVAQTRA". I tried reducing the sequence to see where exactly it failed and I have written the amino acid till which it works and then fails.

Working

model = esm.sdk.client("esm3-small",token= my_token)
protein = ESMProtein(
    sequence=(
        "EHVAATHKTGLDALAEL"
    )
)
protein_tensor = model.encode(protein)

output2 = np.array(model.forward_and_sample(
        protein_tensor, SamplingConfig(return_mean_embedding=True)).mean_embedding)
print(output2.shape)
(1536,)

Not Working

model = esm.sdk.client("esm3-small",token= my_token)
protein = ESMProtein(
    sequence=(
        "EHVAATHKTGLDALAELT"
    )
)
protein_tensor = model.encode(protein)

output2 = np.array(model.forward_and_sample(
        protein_tensor, SamplingConfig(return_mean_embedding=True)).mean_embedding)
print(output2.shape)

[/usr/local/lib/python3.10/dist-packages/esm/sdk/forge.py](https://localhost:8080/#) in forward_and_sample(self, input, sampling_configuration)
    326         }
    327 
--> 328         req["sequence"] = maybe_list(input.sequence)
    329         req["structure"] = maybe_list(input.structure)
    330         req["secondary_structure"] = maybe_list(input.secondary_structure)

AttributeError: 'ESMProteinError' object has no attribute 'sequence'

Am I using the correct model ? I did not get this error on the open model in Hugging Face.

Thanks for your help Regards, Sanjana

santule avatar Nov 14 '24 02:11 santule