Paligemma interface broken for newer version of Transformers.
System Info
As suggested in the title, the lerobot requires transformers higher than 4.48(no upper bound yet). But the following problem will occur when running with the latest transformers version 4.52.3.
File "/home/jiaye.yang/.local/share/mamba/envs/il/lib/python3.11/site-packages/lerobot/common/policies/pi0/modeling_pi0.py", line 319, in forward
losses = self.model.forward(images, img_masks, lang_tokens, lang_masks, state, actions, noise, time)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jiaye.yang/.local/share/mamba/envs/il/lib/python3.11/site-packages/lerobot/common/policies/pi0/modeling_pi0.py", line 625, in forward
prefix_embs, prefix_pad_masks, prefix_att_masks = self.embed_prefix(
^^^^^^^^^^^^^^^^^^
File "/home/jiaye.yang/.local/share/mamba/envs/il/lib/python3.11/site-packages/lerobot/common/policies/pi0/modeling_pi0.py", line 522, in embed_prefix
img_emb = self.paligemma_with_expert.embed_image(img)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jiaye.yang/.local/share/mamba/envs/il/lib/python3.11/site-packages/lerobot/common/policies/pi0/paligemma_with_expert.py", line 220, in embed_image
return self.paligemma.get_image_features(image)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jiaye.yang/.local/share/mamba/envs/il/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1940, in __getattr__
raise AttributeError(
AttributeError: 'PaliGemmaForConditionalGeneration' object has no attribute 'get_image_features'
A further investigation reveal that the relied interface of `get_image_features` from `PaliGemmaForConditionalGeneration` has been removed in this latest version.
https://github.com/huggingface/transformers/blob/v4.52.3/src/transformers/models/paligemma/modeling_paligemma.py
Information
- [x] One of the scripts in the examples/ folder of LeRobot
- [ ] My own task or dataset (give details below)
Reproduction
- start a new environment
- install lerobot and latest transformers (I think will be by default)
- try to inference or finetune with pi0
Expected behavior
works.
I meet the same broken, do this can solve this issue
def embed_image(self, image: torch.Tensor):
return self.paligemma.model.get_image_features(image)
Maybe you also need this, but I'm not sure whether this is correct
def embed_language_tokens(self, tokens: torch.Tensor):
return self.paligemma.model.get_input_embeddings()(tokens)
or you can just
pip install transformers==4.48.1
Hey @LumenYoung 👋 thank you so much for the issue ⭐
- We think this is one of the problems we've been having with
pi0, but it's a bit hard to see this through without a minimally reproducing example. - Can you perhaps share it here? Happy to dig into it myself!
I hit the problem, too.
It seems that transformers changed its API between v4.51.3 and v4.52.0 https://github.com/huggingface/transformers/blob/v4.51.3/src/transformers/models/paligemma/modeling_paligemma.py#L403 https://github.com/huggingface/transformers/blob/v4.52.0/src/transformers/models/paligemma/modeling_paligemma.py#L392