lerobot Add torchcodec cpu

What this does

This PR replaces torchvision CPU decoding by torchcodec CPU decoding. Also added a decode_video_frames function that wraps multiple backends, instead of calling decode_video_frames_BACKENDNAME separately. This makes it more efficient and allows us to add more decoders later on!

The decoder used is decided based on the dataset.video_backend key, but defaults to torchcodec.

How it was tested

Test and Benchmark the decoders on different datasets/policies.

How to checkout & try? (for the reviewer)

Just run the training script, with a dataset containing videos to decode. example:

python lerobot/scripts/train.py \
    --output_dir=outputs/train/act_aloha_insertion \
    --policy.type=act \
    --dataset.repo_id=lerobot/aloha_sim_insertion_human \
    --env.type=aloha \
    --env.task=AlohaInsertion-v0 \

Benchmarks

Ran one benchmark on lerobot/aloha_sim_insertion_human_image dataset Comparison: PyAV vs TorchCodec (CPU)

Metric	PyAV	TorchCodec-CPU
Video to Images Load Time Ratio	1.87	1.25
Avg MSE	5.14e-05	4.88e-05
Avg PSNR	43.17	43.37
Avg SSIM	0.995	0.995

What's left

~~Remove/suppress libdav1d logs (they're noisy) -> there's no env variable to disable those for now but they'll be deactivated in the next version of torchcodec.~~

PR is in a good state ✅

Mar 03 '25 06:03 jadechoghari

Torchcodec consistently outperforms pyav across all datasets and video codecs (encoders), it achieves lower MSE (better accuracy), higher PSNR (better quality), and higher SSIM (better perceptual similarity). this trend is evident across libsvtav1, libx264, and libx265, and it makes torchcodec the superior choice for both efficiency and quality. To reproduce the full results, check this link

Mar 08 '25 08:03 jadechoghari

great!, i guess cc @imstevenpmwork

Mar 14 '25 14:03 jadechoghari

Hello @jadechoghari, thanks for your contribution! This LGTM 😄

Mar 14 '25 15:03 imstevenpmwork

lerobot lerobot copied to clipboard

Add torchcodec cpu

What this does

How it was tested

How to checkout & try? (for the reviewer)

Benchmarks

What's left

lerobot
lerobot copied to clipboard