lerobot icon indicating copy to clipboard operation
lerobot copied to clipboard

Keyerror: task when evaluating fine-tuned pi0 on local so100 arm

Open frk2 opened this issue 9 months ago • 6 comments

System Info

- `lerobot` version: 0.1.0 (Git commit a6015a55f930cdc51fdb035d68533d1434b1cf43)
- Platform: Linux-5.15.0-134-generic-x86_64-with-glibc2.35
- Python version: 3.10.12
- Huggingface_hub version: 0.29.3
- Dataset version: 3.3.2
- Numpy version: 1.24.3
- PyTorch version (GPU?): 2.6.0+cu124 (True)
- Cuda version: 12040
- Using GPU in script?: Nvidia 1080Ti

Information

  • [ ] One of the scripts in the examples/ folder of LeRobot
  • [x] My own task or dataset (give details below)

Reproduction

  1. Record dataset using so100 as laid out in the examples:

python3 lerobot/scripts/control_robot.py --robot.type=so100 --control.type=record --control.fps=30 --control.single_task="Grasp tape" --control.repo_id=frk2/so100 --control.warmup_time_s=5 --control.episode_time_s=20 --control.reset_time_s=10 --control.num_episodes=2 --control.push_to_hub=true

  1. Finetine using the train.py script: python lerobot/scripts/train.py --dataset.repo_id=frk2/so100 --policy.path=lerobot/pi0 --output_dir=/out/so100 --job_name=act_so100_base_env --policy.device=cuda

  2. Evaluate on local so100 arm: python3 lerobot/scripts/control_robot.py --robot.type=so100 -control.fps=30 -control.single_task=\"Grasp tape --control.policy.path=100000/pretrained_model -control.type=record -control.repo_id=frk2/eval_so100"

You end up with:

  File "/home/faraz/Code/lerobot/lerobot/scripts/control_robot.py", line 401, in <module>
    control_robot()
  File "/home/faraz/Code/lerobot/lerobot/configs/parser.py", line 227, in wrapper_inner
    response = fn(cfg, *args, **kwargs)
  File "/home/faraz/Code/lerobot/lerobot/scripts/control_robot.py", line 386, in control_robot
    record(robot, cfg.control)
  File "/home/faraz/Code/lerobot/lerobot/common/robot_devices/utils.py", line 42, in wrapper
    raise e
  File "/home/faraz/Code/lerobot/lerobot/common/robot_devices/utils.py", line 38, in wrapper
    return func(robot, *args, **kwargs)
  File "/home/faraz/Code/lerobot/lerobot/scripts/control_robot.py", line 302, in record
    record_episode(
  File "/home/faraz/Code/lerobot/lerobot/common/robot_devices/control_utils.py", line 200, in record_episode
    control_loop(
  File "/home/faraz/Code/lerobot/lerobot/common/datasets/image_writer.py", line 36, in wrapper
    raise e
  File "/home/faraz/Code/lerobot/lerobot/common/datasets/image_writer.py", line 29, in wrapper
    return func(*args, **kwargs)
  File "/home/faraz/Code/lerobot/lerobot/common/robot_devices/control_utils.py", line 255, in control_loop
    pred_action = predict_action(
  File "/home/faraz/Code/lerobot/lerobot/common/robot_devices/control_utils.py", line 120, in predict_action
    action = policy.select_action(observation)
  File "/home/faraz/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/home/faraz/Code/lerobot/lerobot/common/policies/pi0/modeling_pi0.py", line 283, in select_action
    lang_tokens, lang_masks = self.prepare_language(batch)
  File "/home/faraz/Code/lerobot/lerobot/common/policies/pi0/modeling_pi0.py", line 385, in prepare_language
    tasks = batch["task"]
KeyError: 'task'

I can make the model work by manually adding this key to be the correct task:

index bc53bf85..bc086069 100644
--- a/lerobot/common/policies/pi0/modeling_pi0.py
+++ b/lerobot/common/policies/pi0/modeling_pi0.py
@@ -381,7 +381,8 @@ class PI0Policy(PreTrainedPolicy):
     def prepare_language(self, batch) -> tuple[Tensor, Tensor]:
         """Tokenize the text input"""
         device = batch[OBS_ROBOT].device
-        tasks = batch["task"]
+        tasks = ["Grasp Tape"]
+        # tasks = batch["task"]
 
         # PaliGemma prompt has to end with a new line
         tasks = [task if task.endswith("\n") else f"{task}\n" for task in tasks]```


### Expected behavior

No error

frk2 avatar Mar 24 '25 17:03 frk2

Thank you,frk2.That works, but i have another error in pi0fast: Traceback (most recent call last): File "/home/sonix/lerobot/lerobot/scripts/control_robot.py", line 396, in control_robot() File "/home/sonix/lerobot/lerobot/configs/parser.py", line 227, in wrapper_inner response = fn(cfg, *args, **kwargs) File "/home/sonix/lerobot/lerobot/scripts/control_robot.py", line 381, in control_robot record(robot, cfg.control) File "/home/sonix/lerobot/lerobot/common/robot_devices/utils.py", line 42, in wrapper raise e File "/home/sonix/lerobot/lerobot/common/robot_devices/utils.py", line 38, in wrapper return func(robot, *args, **kwargs) File "/home/sonix/lerobot/lerobot/scripts/control_robot.py", line 297, in record record_episode( File "/home/sonix/lerobot/lerobot/common/robot_devices/control_utils.py", line 200, in record_episode control_loop( File "/home/sonix/lerobot/lerobot/common/datasets/image_writer.py", line 36, in wrapper raise e File "/home/sonix/lerobot/lerobot/common/datasets/image_writer.py", line 29, in wrapper return func(*args, **kwargs) File "/home/sonix/lerobot/lerobot/common/robot_devices/control_utils.py", line 265, in control_loop dataset.add_frame(frame) File "/home/sonix/lerobot/lerobot/common/datasets/lerobot_dataset.py", line 799, in add_frame validate_frame(frame, self.features) File "/home/sonix/lerobot/lerobot/common/datasets/utils.py", line 716, in validate_frame raise ValueError(error_message) ValueError: The feature 'action' of dtype 'float64' is not of the expected dtype 'float32'.

fireman5379 avatar Apr 08 '25 13:04 fireman5379

Thank you,frk2.That works, but i have another error in pi0fast: Traceback (most recent call last): File "/home/sonix/lerobot/lerobot/scripts/control_robot.py", line 396, in control_robot() File "/home/sonix/lerobot/lerobot/configs/parser.py", line 227, in wrapper_inner response = fn(cfg, *args, **kwargs) File "/home/sonix/lerobot/lerobot/scripts/control_robot.py", line 381, in control_robot record(robot, cfg.control) File "/home/sonix/lerobot/lerobot/common/robot_devices/utils.py", line 42, in wrapper raise e File "/home/sonix/lerobot/lerobot/common/robot_devices/utils.py", line 38, in wrapper return func(robot, *args, **kwargs) File "/home/sonix/lerobot/lerobot/scripts/control_robot.py", line 297, in record record_episode( File "/home/sonix/lerobot/lerobot/common/robot_devices/control_utils.py", line 200, in record_episode control_loop( File "/home/sonix/lerobot/lerobot/common/datasets/image_writer.py", line 36, in wrapper raise e File "/home/sonix/lerobot/lerobot/common/datasets/image_writer.py", line 29, in wrapper return func(*args, **kwargs) File "/home/sonix/lerobot/lerobot/common/robot_devices/control_utils.py", line 265, in control_loop dataset.add_frame(frame) File "/home/sonix/lerobot/lerobot/common/datasets/lerobot_dataset.py", line 799, in add_frame validate_frame(frame, self.features) File "/home/sonix/lerobot/lerobot/common/datasets/utils.py", line 716, in validate_frame raise ValueError(error_message) ValueError: The feature 'action' of dtype 'float64' is not of the expected dtype 'float32'.

After I sloved 'task' and some other problems, I finally got the same bugs.....i have no idea how to solve it.

Loki-Lu avatar Apr 10 '25 08:04 Loki-Lu

This doesn't seem like a pi issue. It seems like you're loading a data set that is float64 but your platform is 32-bit? Are you running this in Docker, some other virtualization, or something?

frk2 avatar Apr 10 '25 15:04 frk2

No, I didn't use Docker. This bug seems like it's the model itself. I have converted a JAX model to Torch model and tried to adapted to Lerobot version. I am stucked here, so I have no idea how to solve it. BTW, I used JAX model (pi0) to test, and it's good. But torch version in lerobot, it works badly. So, maybe it's a wrong attempt(change JAX model to Torch).

Loki-Lu avatar Apr 14 '25 05:04 Loki-Lu

Hi guys, did you solve the issue with ValueError: The feature 'action' of dtype 'float64' is not of the expected dtype 'float32'?

PS I've added one more line of code in common/robot_devices/control_utils.py

action = robot.send_action(pred_action)
action = action.to(dtype=torch.float32) # <<- here
action = {"action": action}

alexppppp avatar May 14 '25 13:05 alexppppp

Hi guys, did you solve the issue with ValueError: The feature 'action' of dtype 'float64' is not of the expected dtype 'float32'?

PS I've added one more line of code in common/robot_devices/control_utils.py

action = robot.send_action(pred_action)
action = action.to(dtype=torch.float32) # <<- here
action = {"action": action}

No, I still confused and then abandoned this way. Does your solution work?

Loki-Lu avatar May 15 '25 14:05 Loki-Lu

This issue has been automatically marked as stale because it has not had recent activity (6 months). It will be closed if no further activity occurs. Any change, comment or update to this issue will reset this count. Thank you for your contributions.

github-actions[bot] avatar Nov 12 '25 02:11 github-actions[bot]

This issue was closed because it has been stalled for 14 days with no activity. Feel free to reopen if is still relevant, or to ping a collaborator if you have any questions.

github-actions[bot] avatar Nov 26 '25 02:11 github-actions[bot]