lerobot Support multi-gpus training with accelerate

What this does

This PR supports training on multiple gpus using the accelerate librarie

How it was tested

Launching training on aloha sim with multiple GPUs and obtaining similar scores.

Examples: This requires installing accelerate:

pip install accelerate


POLICY=act

ENV=aloha
TASK=AlohaTransferCube-v0
REPO_ID=lerobot/aloha_sim_transfer_cube_human
DATASET_NAME=aloha_sim_transfer_cube_human

TASK_NAME=lerobot_${DATASET_NAME}_${POLICY}_gpus${GPUS}
TRAIN_DIR=$WORK/logs/lerobot/$TASK_NAME
echo $TRAIN_DIR

PORT=29502

GPUS=2
OFFLINE_STEPS=100000
EVAL_FREQ=1000
BATCH_SIZE=8
EVAL_BATCH_SIZE=10
SAVE_FREQ=10000

export MUJOCO_GL=egl

python -m accelerate.commands.launch --num_processes=$GPUS --mixed_precision=fp16 --main_process_port=$PORT lerobot/scripts/train.py \
     --policy.type=$POLICY  \
     --dataset.repo_id=$REPO_ID \
     --env.type=$ENV \
     --env.task=$TASK \
     --output_dir=$TRAIN_DIR \
     --batch_size=$BATCH_SIZE \
     --steps=$OFFLINE_STEPS \
     --eval_freq=$EVAL_FREQ --save_freq=$SAVE_FREQ --eval.batch_size=$EVAL_BATCH_SIZE --eval.n_episodes=$EVAL_BATCH_SIZE

Feb 26 '25 16:02 mshukor

@bot /style

Feb 27 '25 15:02 qgallouedec

Style fixes have been applied. View the workflow run here.

Feb 27 '25 15:02 github-actions[bot]

“[rank2]: torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 64.00 MiB. GPU 2 has a total capacity of 31.74 GiB of which 43.38 MiB is free. Process 3850605 has 31.69 GiB memory in use. Of the allocated memory 31.08 GiB is allocated by PyTorch, and 94.10 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)”，How to solve the problem of memory overflow？

Jun 19 '25 07:06 zzzmy-all

Thank you so much for the PR, however closing this as we recently supported multi-gpu training with accelerate: https://github.com/huggingface/lerobot/pull/2154

Oct 17 '25 11:10 jadechoghari