flex attentio in PI0 -

Open Wonder1905 opened this issue 10 months ago • 0 comments

System Info

- `lerobot` version: 0.1.0
- Platform: Linux-6.8.0-52-generic-x86_64-with-glibc2.35
- Python version: 3.10.16
- Huggingface_hub version: 0.29.3
- Dataset version: 3.3.2
- Numpy version: 2.1.3
- PyTorch version (GPU?): 2.6.0+cu124 (True)
- Cuda version: 12040
- Using GPU in script?: yes

Information

[x] One of the scripts in the examples/ folder of LeRobot
[ ] My own task or dataset (give details below)

Reproduction

-git clone -Build conda env and Install as in Readme

-changed in configuration_pi0.py from eager to flex: attention_implementation: str = "flex" # or fa2, flex

Run: python lerobot/scripts/train.py
--policy.type=pi0
--dataset.repo_id=danaaubakirova/koch_test --batch_size 2

Expected behavior

This is the error trace:

File "lerobot/configs/parser.py", line 227, in wrapper_inner response = fn(cfg, *args, **kwargs) File "lerobot/scripts/train.py", line 212, in train train_tracker, output_dict = update_policy( File "lerobot/scripts/train.py", line 71, in update_policy loss, output_dict = policy.forward(batch) File "lerobot/common/policies/pi0/modeling_pi0.py", line 319, in forward losses = self.model.forward(images, img_masks, lang_tokens, lang_masks, state, actions, noise, time) File "lerobot/common/policies/pi0/modeling_pi0.py", line 636, in forward (_, suffix_out), _ = self.paligemma_with_expert.forward( File "lerobot/common/policies/pi0/paligemma_with_expert.py", line 300, in forward att_output = attention_interface( File "lerobot/common/policies/pi0/flex_attention.py", line 124, in flex_attention_forward attn_output, attention_weights = flex_attention( File "torch/nn/attention/flex_attention.py", line 1293, in flex_attention raise ValueError( ValueError: block_mask was created for block_mask.shape=(2, 8, 640, 640) but got q_len=611 and kv_len=611. As the block mask was created for a larger length than you're using it for, you can either 1. create a new block mask with the correct length, or 2. 'adjust' the existing block mask to the correct length by calling block_mask._adjust(q_len, kv_len). This essentially 'crops' the block mask to the upper left corner, which does not work for all mask_mods!

The error is pretty straight forward, the masks are 640 (as it needs to be 128 mult) the inputs are 611, size mismatch make sense. My expected behavior is: code will be able to run the flex attention.

I assume that old versions of flex attention took care of it and not anymore?

Ill be happy for your help.

Mar 14 '25 13:03 Wonder1905