keras-nlp Add Esm

from https://github.com/keras-team/keras-hub/issues/2177 Achieved a smaller error with hf.

import os
os.environ["KERAS_BACKEND"] = "torch"
os.environ["HF_ENDPOINT"] = "https://hf-mirror.com"

from keras import ops
from transformers.models.esm.modeling_esm import EsmAttention as hf_EsmSelfAttention
from transformers import EsmConfig
from esm2.esm2_layers import EsmSelfAttention
import numpy as np
import keras
from transformers.models.esm.modeling_esm import EsmModel
weights_path = "facebook/esm2_t6_8M_UR50D"
hf_model = EsmModel.from_pretrained(weights_path)
hf_model.cuda().eval()
hf_model.embeddings.token_dropout = False


from keras_hub.src.models.esm.esm_backbone import (
    ESMBackbone,
)


keras_model =  ESMBackbone.from_preset('hf://'+weights_path)
keras_model.summary()


x = ops.array([[1,2,3,4,5]])+1
hf_out = hf_model(x,ops.ones_like(x))[0]
keras_out = keras_model({'token_ids': x})

print(ops.all(ops.isclose(hf_out, keras_out,atol=1e-4)))

ESM Checkpoint Conversion and Numerics Verification Demo (across multiple backends): Notebook Link

Train Demo: Notebook Link

May 03 '25 07:05 pass-lin

ruff.....................................................................Passed
ruff-format..............................................................Passed
Error: Process completed with exit code 1.

Please help me figure out how to solve this problem.

May 03 '25 07:05 pass-lin

Probably an issue with generating the API symbols. Looks like you need to sync with the latest changes on master, then you could try running ./shell/api_gen.sh

May 06 '25 18:05 mattdangerw

ruff.....................................................................Passed
ruff-format..............................................................Passed
Error: Process completed with exit code 1.

Please help me figure out how to solve this problem.

You can rebase it to latest master code and then run - pre-commit run --all-files pip install -u namex

May 09 '25 17:05 sachinprasadhs

keras_hub/src/layers/modeling/reversible_embedding_test.py::ReversibleEmbeddingTest::test_quantize_dtype_argument_tie_weights - TypeError: _int8_build() takes 2 positional arguments but 3 were given
FAILED keras_hub/src/layers/modeling/reversible_embedding_test.py::ReversibleEmbeddingTest::test_quantize_dtype_argument_untie_weights - TypeError: _int8_build() takes 2 positional arguments but 3 were given
FAILED keras_hub/src/layers/modeling/reversible_embedding_test.py::ReversibleEmbeddingTest::test_quantize_int8_tie_weights - TypeError: _int8_build() takes 2 positional arguments but 3 were given
FAILED keras_hub/src/layers/modeling/reversible_embedding_test.py::ReversibleEmbeddingTest::test_quantize_int8_untie_weights - TypeError: _int8_build() takes 2 positional arguments but 3 were given
FAILED keras_hub/src/models/albert/albert_backbone_test.py::AlbertBackboneTest::test_backbone_basics - TypeError: _int8_build() takes 2 positional arguments but 3 were given
FAILED keras_hub/src/models/bart/bart_backbone_test.py::BartBackboneTest::test_backbone_basics - TypeError: _int8_build() takes 2 positional arguments but 3 were given
FAILED keras_hub/src/models/bert/bert_backbone_test.py::BertBackboneTest::test_backbone_basics - TypeError: _int8_build() takes 2 positional arguments but 3 were given
FAILED keras_hub/src/models/bloom/bloom_backbone_test.py::BloomBackboneTest::test_backbone_basics - TypeError: _int8_build() takes 2 positional arguments but 3 were given
FAILED keras_hub/src/models/clip/clip_backbone_test.py::CLIPBackboneTest::test_backbone_basics - TypeError: _int8_build() takes 2 positional arguments but 3 were given
FAILED keras_hub/src/models/deberta_v3/deberta_v3_backbone_test.py::DebertaV3BackboneTest::test_backbone_basics - TypeError: _int8_build() takes 2 positional arguments but 3 were given
FAILED keras_hub/src/models/distil_bert/distil_bert_backbone_test.py::DistilBertBackboneTest::test_backbone_basics - TypeError: _int8_build() takes 2 positional arguments but 3 were given
FAILED keras_hub/src/models/electra/electra_backbone_test.py::ElectraBackboneTest::test_backbone_basics - TypeError: _int8_build() takes 2 positional arguments but 3 were given
FAILED keras_hub/src/models/f_net/f_net_backbone_test.py::FNetBackboneTest::test_backbone_basics - TypeError: _int8_build() takes 2 positional arguments but 3 were given
FAILED keras_hub/src/models/falcon/falcon_backbone_test.py::FalconBackboneTest::test_backbone_basics - TypeError: _int8_build() takes 2 positional arguments but 3 were given
FAILED keras_hub/src/models/gemma/gemma_backbone_test.py::GemmaBackboneTest::test_backbone_basics - TypeError: _int8_build() takes 2 positional arguments but 3 were given
FAILED keras_hub/src/models/gemma/gemma_backbone_test.py::Gemma2BackboneTest::test_backbone_basics - TypeError: _int8_build() takes 2 positional arguments but 3 were given
FAILED keras_hub/src/models/gpt2/gpt2_backbone_test.py::GPT2BackboneTest::test_backbone_basics - TypeError: _int8_build() takes 2 positional arguments but 3 were given
FAILED keras_hub/src/models/gpt_neo_x/gpt_neo_x_backbone_test.py::GPTNeoXBackboneTest::test_backbone_basics - TypeError: _int8_build() takes 2 positional arguments but 3 were given
FAILED keras_hub/src/models/llama/llama_backbone_test.py::LlamaTest::test_backbone_basics - TypeError: _int8_build() takes 2 positional arguments but 3 were given
FAILED keras_hub/src/models/mistral/mistral_backbone_test.py::MistralBackboneTest::test_backbone_basics - TypeError: _int8_build() takes 2 positional arguments but 3 were given
FAILED keras_hub/src/models/opt/opt_backbone_test.py::OPTBackboneTest::test_backbone_basics - TypeError: _int8_build() takes 2 positional arguments but 3 were given
FAILED keras_hub/src/models/pali_gemma/pali_gemma_backbone_test.py::PaliGemmaBackboneTest::test_backbone_basics - TypeError: _int8_build() takes 2 positional arguments but 3 were given
FAILED keras_hub/src/models/pali_gemma/pali_gemma_backbone_test.py::PaliGemma2BackboneTest::test_backbone_basics - TypeError: _int8_build() takes 2 positional arguments but 3 were given
FAILED keras_hub/src/models/phi3/phi3_backbone_test.py::Phi3Test::test_backbone_basics - TypeError: _int8_build() takes 2 positional arguments but 3 were given
FAILED keras_hub/src/models/phi3/phi3_backbone_test.py::Phi3Test::test_backbone_basics_with_su_rotary - TypeError: _int8_build() takes 2 positional arguments but 3 were given
FAILED keras_hub/src/models/roberta/roberta_backbone_test.py::RobertaBackboneTest::test_backbone_basics - TypeError: _int8_build() takes 2 positional arguments but 3 were given
FAILED keras_hub/src/models/siglip/siglip_backbone_test.py::SigLIPBackboneTest::test_backbone_basics - TypeError: _int8_build() takes 2 positional arguments but 3 were given
FAILED keras_hub/src/models/siglip/siglip_backbone_test.py::SigLIP2BackboneTest::test_backbone_basics - TypeError: _int8_build() takes 2 positional arguments but 3 were given
FAILED keras_hub/src/models/t5/t5_backbone_test.py::T5BackboneTest::test_backbone_basics - TypeError: _int8_build() takes 2 positional arguments but 3 were given
FAILED keras_hub/src/models/whisper/whisper_backbone_test.py::WhisperBackboneTest::test_backbone_basics - TypeError: _int8_build() takes 2 positional arguments but 3 were given
FAILED keras_hub/src/models/xlm_roberta/xlm_roberta_backbone_test.py

@mattdangerw @sachinprasadhs Is it a problem with the test environment? Why are there so many errors that don't belong to me?

May 10 '25 13:05 pass-lin

It's not related to your code, looks like some issue with the JAX backend, we will look into it.

May 12 '25 17:05 sachinprasadhs

@sachinprasadhs @mattdangerw Can anybody review my code?

May 17 '25 18:05 pass-lin

@mattdangerw @sachinprasadhs Please check my code, thank you.

Jun 02 '25 18:06 pass-lin

Once you address all the comments, add end to end working colab along with the checkpoints conversion under: keras-hub/tools/checkpoint_conversion

Jun 02 '25 19:06 sachinprasadhs

Once you address all the comments, add end to end working colab along with the checkpoints conversion under: keras-hub/tools/checkpoint_conversion

Ok, please check the new code.

Jun 03 '25 10:06 pass-lin

Thanks, few minor comments.

Also, need more details specific to Keras 3.6 older version issue.

Finally in the PR description, add the colab notebook to show end to end working of the model, numerics verification. you can follow the PR description template from the recent PR.

How to add a Colab notebook? Can you give me give a demo?

Jun 11 '25 05:06 pass-lin

Thanks, few minor comments. Also, need more details specific to Keras 3.6 older version issue. Finally in the PR description, add the colab notebook to show end to end working of the model, numerics verification. you can follow the PR description template from the recent PR.

How to add a Colab notebook? Can you give me give a demo?

Adding from one of the recent PR which got merged, you can do something like this

DeiT Checkpoint Conversion and Numerics Verification Demo (across multiple backends): Notebook Link
DeiT End-to-End Demo (zero-shot and finetuning): Notebook Link
Here are the converted DeiT presets from Hugging Face checkpoints for reference.

Jun 11 '25 18:06 sachinprasadhs

Thanks, few minor comments. Also, need more details specific to Keras 3.6 older version issue. Finally in the PR description, add the colab notebook to show end to end working of the model, numerics verification. you can follow the PR description template from the recent PR.

How to add a Colab notebook? Can you give me give a demo?

Adding from one of the recent PR which got merged, you can do something like this

DeiT Checkpoint Conversion and Numerics Verification Demo (across multiple backends): Notebook Link

DeiT End-to-End Demo (zero-shot and finetuning): Notebook Link

Here are the converted DeiT presets from Hugging Face checkpoints for reference.

Hello, I've already added the Colab demo of tools/checkpoint_conversion/convert_esm_checkpoints.py in the PR description. I think this is enough, and we can refer to BERT for the rest. Can we merge now?

Jun 12 '25 07:06 pass-lin

We don't have access to view the notebook, can you make it public. Thanks

Jun 12 '25 17:06 sachinprasadhs

We don't have access to view the notebook, can you make it public. Thanks

OK,It has been enable sharing

Jun 12 '25 18:06 pass-lin

Hi, The intention of the notebook is to verify the correctness of the model including, backbone, tasks with the usage details and the expected outcome and to verify the numerics stablity after weights transfer to the Keras architecture, with wither forward pass.

Jun 13 '25 20:06 sachinprasadhs

Hi, The intention of the notebook is to verify the correctness of the model including, backbone, tasks with the usage details and the expected outcome and to verify the numerics stablity after weights transfer to the Keras architecture, with wither forward pass.

Okay, I've added another notebook, which is a demo for predicting the suitable pH of enzymes using ESM.

Jun 14 '25 05:06 pass-lin

You can remove the esm2_t6_8M directory, that will be generated using the conversion script you have provided and will be uploaded to Kaggle.

The notebook which you have provided doesn't have predict method, take any sample suitable input and display the output with predict.

Also in your conversion script, you have mentioned atol=1e-3, what would be the error percentage when the atol=1e-04 and we need following things in your notebook

Numerics verification, load the original ESM model and do forward pass, and do the same forward pass to Keras-Hub ESM implementation and compare the numerics layer by layer to show if numerics are matching(preferably to the 1e-4 precision)
Demonstrating usage of proprocessor, Tokenizer and other functionalities of ESM

I have provided the reference notebooks, please refer those.

You can keep only ESM changes in this PR, you can create a new PR for roformer which also needs checkpoint conversion script, so that we can maintain the latest weight in Kaggle by generating the new weights with the script with any future changes to Keras Hub model specific.

Jun 16 '25 21:06 sachinprasadhs

You can remove the esm2_t6_8M directory, that will be generated using the conversion script you have provided and will be uploaded to Kaggle.

The notebook which you have provided doesn't have predict method, take any sample suitable input and display the output with predict.

Also in your conversion script, you have mentioned atol=1e-3, what would be the error percentage when the atol=1e-04 and we need following things in your notebook

Numerics verification, load the original ESM model and do forward pass, and do the same forward pass to Keras-Hub ESM implementation and compare the numerics layer by layer to show if numerics are matching(preferably to the 1e-4 precision)

Demonstrating usage of proprocessor, Tokenizer and other functionalities of ESM

I have provided the reference notebooks, please refer those.

You can keep only ESM changes in this PR, you can create a new PR for roformer which also needs checkpoint conversion script, so that we can maintain the latest weight in Kaggle by generating the new weights with the script with any future changes to Keras Hub model specific.

OK, I have modified the notebook, please check. In addition, roformerV2 does not need to convert scripts, it is a native keras model. I just modified the keras2 api

Jun 17 '25 06:06 pass-lin

@sachinprasadhs plz check my notebook

Jun 28 '25 07:06 pass-lin

Hi, Still your notebook does not demonstrate the actual use case example demonstrations like https://huggingface.co/docs/transformers/en/model_doc/esm#transformers.EsmForSequenceClassification.forward.example or https://huggingface.co/docs/transformers/en/model_doc/esm#transformers.EsmForProteinFolding.forward.example or https://huggingface.co/docs/transformers/en/model_doc/esm#transformers.EsmForTokenClassification.forward.example, please include it.

Jul 09 '25 21:07 sachinprasadhs

Hi, Still your notebook does not demonstrate the actual use case example demonstrations like https://huggingface.co/docs/transformers/en/model_doc/esm#transformers.EsmForSequenceClassification.forward.example or https://huggingface.co/docs/transformers/en/model_doc/esm#transformers.EsmForProteinFolding.forward.example or https://huggingface.co/docs/transformers/en/model_doc/esm#transformers.EsmForTokenClassification.forward.example, please include it.

We've included a training demo for ESM. As for ESMFold, that's another brand new pr. So can you just click and tell me what demo to add? Sorry for the trouble.

Jul 10 '25 02:07 pass-lin

Any demo with the implementation you have which predicts the actual data or the sample input data and display the output in the existing colab, and remove the folder/directory named esm2_t6_8M in your code, rest all it looks good. Thanks for all the work.

Jul 10 '25 22:07 sachinprasadhs

/gemini review

Jul 11 '25 00:07 divyashreepathihalli

Any demo with the implementation you have which predicts the actual data or the sample input data and display the output in the existing colab, and remove the folder/directory named esm2_t6_8M in your code, rest all it looks good. Thanks for all the work.

I’m not sure what you mean by “delete the esm2_t6_8M directory.”

Looking at the demo notebook, all it does is install the environment, change the OS, and then run:

python tools/checkpoint_conversion/convert_deit_checkpoints.py --preset deit-base-distilled-patch16-384

In my notebook I did exactly the same thing: installed the environment, changed the OS, and then ran

python tools/checkpoint_conversion/convert_esm_checkpoints.py --preset esm2_t6_8M

Could you give a more precise and detailed description of which notebook has the problem and what it is missing compared to the reference notebook?
In the reference notebook, what exactly shows that the esm2_t6_8M directory should be removed?

Further, in another notebook I explicitly provide demonstrations of predict, fit, and evaluate. What exactly is still missing?

A clear description would be greatly appreciated—thank you for your help! And sorry for the extra work caused by adding a detailed description to you.

Jul 11 '25 08:07 pass-lin

/gemini review

Thanks, I fixed some error with reference to gemini's review.

Jul 11 '25 08:07 pass-lin

In your code commit, there is the files with checkpoint files generated, we don't keep these files in our github, we upload the checkpoints to kaggle/ Hugging face, and the files will be generated by running the conversion script you have provided, you don't need to provide the converted checkpoints here in your commit

Jul 14 '25 21:07 sachinprasadhs

In your code commit, there is the files with checkpoint files generated, we don't keep these files in our github, we upload the checkpoints to kaggle/ Hugging face, and the files will be generated by running the conversion script you have provided, you don't need to provide the converted checkpoints here in your commit

I'm so sorry, this is an intermediate product when running the test before. I didn't notice his existence. Thanks very much for your reminder.

Jul 15 '25 05:07 pass-lin

@sachinprasadhs How far are we from merging now?

Jul 17 '25 16:07 pass-lin

Hi, Just one small comment, rest everything looks good, please replace head_size argument in both backbone and classifier example with a modified argument name which i believe num_heads?

Jul 21 '25 17:07 sachinprasadhs

Hi, Just one small comment, rest everything looks good, please replace head_size argument in both backbone and classifier example with a modified argument name which i believe num_heads?

Thanks for the reminder, I'll make changes now.

Jul 22 '25 02:07 pass-lin