lerobot icon indicating copy to clipboard operation
lerobot copied to clipboard

Paligemma interface broken for newer version of Transformers.

Open LumenYoung opened this issue 6 months ago • 2 comments

System Info

As suggested in the title, the lerobot requires transformers higher than 4.48(no upper bound yet). But the following problem will occur when running with the latest transformers version 4.52.3.

  File "/home/jiaye.yang/.local/share/mamba/envs/il/lib/python3.11/site-packages/lerobot/common/policies/pi0/modeling_pi0.py", line 319, in forward
    losses = self.model.forward(images, img_masks, lang_tokens, lang_masks, state, actions, noise, time)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jiaye.yang/.local/share/mamba/envs/il/lib/python3.11/site-packages/lerobot/common/policies/pi0/modeling_pi0.py", line 625, in forward
    prefix_embs, prefix_pad_masks, prefix_att_masks = self.embed_prefix(
                                                      ^^^^^^^^^^^^^^^^^^
  File "/home/jiaye.yang/.local/share/mamba/envs/il/lib/python3.11/site-packages/lerobot/common/policies/pi0/modeling_pi0.py", line 522, in embed_prefix
    img_emb = self.paligemma_with_expert.embed_image(img)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jiaye.yang/.local/share/mamba/envs/il/lib/python3.11/site-packages/lerobot/common/policies/pi0/paligemma_with_expert.py", line 220, in embed_image
    return self.paligemma.get_image_features(image)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jiaye.yang/.local/share/mamba/envs/il/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1940, in __getattr__
    raise AttributeError(
AttributeError: 'PaliGemmaForConditionalGeneration' object has no attribute 'get_image_features'


A further investigation reveal that the relied interface of `get_image_features` from `PaliGemmaForConditionalGeneration` has been removed in this latest version.

https://github.com/huggingface/transformers/blob/v4.52.3/src/transformers/models/paligemma/modeling_paligemma.py

Information

  • [x] One of the scripts in the examples/ folder of LeRobot
  • [ ] My own task or dataset (give details below)

Reproduction

  1. start a new environment
  2. install lerobot and latest transformers (I think will be by default)
  3. try to inference or finetune with pi0

Expected behavior

works.

LumenYoung avatar May 25 '25 21:05 LumenYoung

I meet the same broken, do this can solve this issue

    def embed_image(self, image: torch.Tensor):
        return self.paligemma.model.get_image_features(image)

Maybe you also need this, but I'm not sure whether this is correct

    def embed_language_tokens(self, tokens: torch.Tensor):
        return self.paligemma.model.get_input_embeddings()(tokens)

or you can just

pip install transformers==4.48.1

cyteena avatar May 26 '25 06:05 cyteena

Hey @LumenYoung 👋 thank you so much for the issue ⭐

  • We think this is one of the problems we've been having with pi0, but it's a bit hard to see this through without a minimally reproducing example.
  • Can you perhaps share it here? Happy to dig into it myself!

fracapuano avatar Jun 19 '25 14:06 fracapuano

I hit the problem, too.

It seems that transformers changed its API between v4.51.3 and v4.52.0 https://github.com/huggingface/transformers/blob/v4.51.3/src/transformers/models/paligemma/modeling_paligemma.py#L403 https://github.com/huggingface/transformers/blob/v4.52.0/src/transformers/models/paligemma/modeling_paligemma.py#L392

ymd-h avatar Jul 21 '25 00:07 ymd-h