LogitsConfig.__init__() got an unexpected keyword argument 'ith_hidden_layer'
How can I extract ESMC_6B embedding of sequences? I tried to extract protein embedding, following the instructions in https://github.com/evolutionaryscale/esm/blob/main/cookbook/tutorials/2_embed.ipynb.
ESMC_6B_EMBEDDING_CONFIG = LogitsConfig(return_hidden_states=True, ith_hidden_layer=55)
TypeError Traceback (most recent call last) Cell In[40], line 1 ----> 1 ESMC_6B_EMBEDDING_CONFIG = LogitsConfig(return_hidden_states=True, ith_hidden_layer=55)
TypeError: LogitsConfig.init() got an unexpected keyword argument 'ith_hidden_layer'
After spending several hours to compare the source codes from pip install esm with this repo, I found that if directly using
pip install esm
to install esm3, it will cause the problem because the esm pip package seems have not been updated yet. You can try installing the env via
# clone this repo first
git clone https://github.com/evolutionaryscale/esm.git
conda create -n esm python=3.10
conda activate esm
# cd into the directory of this repo
python -m pip install .
# This will install the dependencies via pyproject.toml in this repo
For the single sequence using ESMC_6B embedding, you can use the following example
# Modify the LogitsConfig and model setting to ESMC_6B first
from esm.sdk import client
model = client(
model="esmc-6b-2024-12", url="https://forge.evolutionaryscale.ai", token=YOUR_TOKEN
)
# Suppose you want to extract the embedding of the last hidden layer 80 in ESMC_6B
ESMC_6B_EMBEDDING_CONFIG = LogitsConfig(sequence=True, return_embeddings=True, return_hidden_states=True, ith_hidden_layer=79)
def embed_sequence(model: ESM3InferenceClient, sequence: str) -> LogitsOutput:
# I found the error message of ESMC is somehow difficult to find, so I directly print all the variables
protein = ESMProtein(sequence=sequence)
#print(protein)
protein_tensor = model.encode(protein)
#print(protein_tensor)
output = model.logits(protein_tensor, ESMC_6B_EMBEDDING_CONFIG)
#print(output)
return output
sequence="AAAAA"
logits_output = embed_sequence(model, sequence)
#print(logits_output.logits, logits_output.embeddings, logits_output.hidden_states)
# Check if the hidden_states can be successfully extracted
print(logits_output.hidden_states)
I think this will solve your problem.
Cheers, Ryan
Thank you for the detailed instructions! The issue was resolved after I Successfully installed esm-3.1.3 as you suggested. The embedding extraction works as expected now.
I really appreciate your help!
Happy to help! :tada::tada::tada:
I was experiencing the same issue but following the suggestion by @Ryan-Hu-Hu-Hu the issue was resolved. However, upon proceeding to the next step in my code to extract embeddings, I received the following error:
TypeError: ESM3SageMakerClient._post() got an unexpected keyword argument 'return_bytes'
The code that I am using is:
model_name_for_esmc = get_model_name(model_package_arn)
model = ESM3SageMakerClient(
endpoint_name = ENDPOINT_NAME,
model = model_name_for_esmc,
)
protein = ESMProtein(sequence="AAAAA")
protein_tensor = model.encode(protein)
logits_output = model.logits(protein_tensor, LogitsConfig(sequence=True, return_embeddings=True))
print("Logits:", logits_output.logits)
print("Embeddings:", logits_output.embeddings)
However, this issue gets resolved with esm==3.1.1. But then it causes the LogitsConfig error again.
I am using Sagemaker for this. I have created an endpoint and deployed the ESMC-300M model there
Any suggestions/workarounds are appreciated. Thank you!
Are these still issues on the latest version? I believe we fixed most of this.