lerobot icon indicating copy to clipboard operation
lerobot copied to clipboard

Attribute `type` missing from model training artifacts from `train.py`

Open ankile opened this issue 6 months ago • 2 comments

System Info

- `lerobot` version: 0.1.0
- Platform: Linux-6.8.0-58-generic-x86_64-with-glibc2.35
- Python version: 3.10.15
- Huggingface_hub version: 0.25.0
- Dataset version: 3.6.0
- Numpy version: 1.26.0
- PyTorch version (GPU?): 2.7.0+cu126 (True)
- Cuda version: 12060
- Using GPU in script?: Yes, NVIDIA GeForce RTX 4090

Information

  • [x] One of the scripts in the examples/ folder of LeRobot
  • [ ] My own task or dataset (give details below)

Reproduction

When trying to run

policy = DiffusionPolicy.from_pretrained(pretrained_policy_path)

with the output of running the training script like so:

python lerobot/scripts/train.py \                                          
    --dataset.repo_id=ankile/franka-lift \
    --policy.type=diffusion \              
    --wandb.enable=true

I get an error draccus.utils.ParsingError: Expected a dict with a 'type' key for <class 'lerobot.configs.policies.PreTrainedConfig'>

I.e., why doesn't the training script store a config.json that's complete in this case?

Expected behavior

It seems to me that the output of the train.py script when run with one of the default training configs, would output checkpoint files that would allow me to load them without errors.

ankile avatar May 12 '25 22:05 ankile

'train.py' will output default configs. you can find them in your output dir, e.g.: outputs/train/2025-05-19/16-29-03_xarm_act/checkpoints/last/pretrained_model (and from there, config.json or dig deeper for training config)

The error that you're getting when attempting to train is because it is not aware of which env type you want to use. You can fix this by adding this to your training script: --env.type=<env>

where <env> is one of the following:

  • aloha
  • xarm
  • pusht

brainwavecoder9 avatar May 19 '25 21:05 brainwavecoder9

This is probably due to the high draccus version. With draccus==0.10.0, draccus.dump() automatically creates a type key inside condig.json after training. However, with draccus>=0.11.0, type is not included by default, which causes the error.

This issue seems to be addressed in the recent PR: https://github.com/huggingface/lerobot/pull/1022 by fixing draccus==0.10.0.

Try re-installing draccus with the correct version pip install draccus==0.10.0 and re-run the training & policy instantiation.

Related issue: https://github.com/huggingface/lerobot/issues/1105

tatsukamijo avatar Jun 04 '25 13:06 tatsukamijo