esm icon indicating copy to clipboard operation
esm copied to clipboard

LogitsConfig.__init__() got an unexpected keyword argument 'ith_hidden_layer'

Open zw-SIMM opened this issue 11 months ago • 5 comments

How can I extract ESMC_6B embedding of sequences? I tried to extract protein embedding, following the instructions in https://github.com/evolutionaryscale/esm/blob/main/cookbook/tutorials/2_embed.ipynb.

ESMC_6B_EMBEDDING_CONFIG = LogitsConfig(return_hidden_states=True, ith_hidden_layer=55)

TypeError Traceback (most recent call last) Cell In[40], line 1 ----> 1 ESMC_6B_EMBEDDING_CONFIG = LogitsConfig(return_hidden_states=True, ith_hidden_layer=55)

TypeError: LogitsConfig.init() got an unexpected keyword argument 'ith_hidden_layer'

zw-SIMM avatar Jan 17 '25 02:01 zw-SIMM

After spending several hours to compare the source codes from pip install esm with this repo, I found that if directly using

pip install esm

to install esm3, it will cause the problem because the esm pip package seems have not been updated yet. You can try installing the env via

# clone this repo first
git clone https://github.com/evolutionaryscale/esm.git
conda create -n esm python=3.10
conda activate esm
# cd into the directory of this repo 
python -m pip install .
# This will install the dependencies via pyproject.toml in this repo

For the single sequence using ESMC_6B embedding, you can use the following example

# Modify the LogitsConfig and model setting to ESMC_6B first
from esm.sdk import client
model = client(
    model="esmc-6b-2024-12", url="https://forge.evolutionaryscale.ai", token=YOUR_TOKEN
)
# Suppose you want to extract the embedding of the last hidden layer 80 in ESMC_6B
ESMC_6B_EMBEDDING_CONFIG = LogitsConfig(sequence=True, return_embeddings=True, return_hidden_states=True, ith_hidden_layer=79)

def embed_sequence(model: ESM3InferenceClient, sequence: str) -> LogitsOutput:
    # I found the error message of ESMC is somehow difficult to find, so I directly print all the variables 
    protein = ESMProtein(sequence=sequence)
    #print(protein)
    protein_tensor = model.encode(protein)
    #print(protein_tensor)
    output = model.logits(protein_tensor, ESMC_6B_EMBEDDING_CONFIG)
    #print(output)
    return output

sequence="AAAAA"
logits_output = embed_sequence(model, sequence)
#print(logits_output.logits, logits_output.embeddings, logits_output.hidden_states)
# Check if the hidden_states can be successfully extracted
print(logits_output.hidden_states)

I think this will solve your problem.

Cheers, Ryan

Ryan-Hu-Hu-Hu avatar Jan 20 '25 18:01 Ryan-Hu-Hu-Hu

Thank you for the detailed instructions! The issue was resolved after I Successfully installed esm-3.1.3 as you suggested. The embedding extraction works as expected now.

I really appreciate your help!

zw-SIMM avatar Jan 21 '25 01:01 zw-SIMM

Happy to help! :tada::tada::tada:

Ryan-Hu-Hu-Hu avatar Jan 21 '25 06:01 Ryan-Hu-Hu-Hu

I was experiencing the same issue but following the suggestion by @Ryan-Hu-Hu-Hu the issue was resolved. However, upon proceeding to the next step in my code to extract embeddings, I received the following error: TypeError: ESM3SageMakerClient._post() got an unexpected keyword argument 'return_bytes'

The code that I am using is:

model_name_for_esmc = get_model_name(model_package_arn)
model = ESM3SageMakerClient(
    endpoint_name = ENDPOINT_NAME,
    model = model_name_for_esmc,
)
protein = ESMProtein(sequence="AAAAA")
protein_tensor = model.encode(protein)
logits_output = model.logits(protein_tensor, LogitsConfig(sequence=True, return_embeddings=True))
print("Logits:", logits_output.logits)
print("Embeddings:", logits_output.embeddings)

However, this issue gets resolved with esm==3.1.1. But then it causes the LogitsConfig error again.

I am using Sagemaker for this. I have created an endpoint and deployed the ESMC-300M model there

Any suggestions/workarounds are appreciated. Thank you!

SahilBodke avatar Mar 03 '25 20:03 SahilBodke

Are these still issues on the latest version? I believe we fixed most of this.

ebetica avatar Sep 19 '25 20:09 ebetica