lerobot
lerobot copied to clipboard
fix(deps): constrain PyAV version to resolve OpenCV-python ffmpeg version conflict
During the installation of lerobot, we resolve to the following dependency versions:
opencv-python == 4.11.0.86
av == 14.2.0
torchvision == 0.21.0
- PyAV resolves to version 14.2.0, which relies on the latest version of ffmpeg.
- OpenCV-python, however, includes its own bundled versions of PyAV and ffmpeg.
- Other dependencies in lerobot, such as torchvision, depend on PyAV. This leads to a runtime error due to incompatible ffmpeg versions.
In summary, the current dependency setup makes it impossible for the resolved versions of PyAV and OpenCV-python to work together. This has caused multiple runtime errors, resulting in several GitHub issues and community-proposed workarounds. However, these solutions often introduce new issues elsewhere in the codebase. Relevant discussions and issues can be found here:
- https://github.com/huggingface/lerobot/pull/757
- https://github.com/huggingface/lerobot/issues/679
- https://github.com/huggingface/lerobot/issues/742
- https://github.com/huggingface/lerobot/pull/519
Additionally, some related discussions are ongoing in the respective projects:
- https://github.com/pytorch/vision/issues/5940
- https://github.com/PyAV-Org/PyAV/issues/978
- https://github.com/opencv/opencv/issues/21952
After analyzing the situation, I’ve identified three potential solutions:
- Use OpenCV-headless: This avoids the ffmpeg dependency and resolves the conflict. However, since we rely on
imshow()and other GUI functionalities for debugging and examples, this option is not feasible. - Find compatible versions of OpenCV-python and PyAV: Identify versions of both libraries that use the same ffmpeg version, eliminating the conflict. This is the goal of the current PR.
- Manually manage dependencies:
- Allow the dependency manager to resolve the initial versions.
- Install PyAV's ffmpeg dependencies (
libavcodec-dev libavformat-dev libavdevice-dev libavfilter-dev libavutil-dev libswresample-dev libswscale-dev). - Reinstall PyAV using:
pip install av --no-binary av --no-cache --force-reinstall, forcing it to build against the ffmpeg version already in the environment (e.g., the one bundled with OpenCV-python). - However, reinstalling this way will attempt to use and build the latest PyAV version, which will fail because of not having the required ffmpeg version.
- If we want to build PyAV, we need to specify a version that can be compiled against the ffmpeg version from OpenCV. As of today, this version is
av>=12.3.0,<13.0.0. - But if this version can be built successfully, we can then skip the manual build process and simply specify this requirement in pyproject.toml, effectively aligning with solution 2. This is the change proposed in this PR.
The changes in this PR have passed the nightly and test CI workflows and have been verified on Linux using both conda and uv as package managers. However, this solution will only remain viable as long as the OpenCV-python version resolved during installation continues to use a ffmpeg version compatible with PyAV >=12.3.0,<13.0.0.
When OpenCV-python eventually updates its bundled ffmpeg version, we can relax the PyAV dependency constraint and specify a version range that aligns with the new ffmpeg version used by OpenCV.
About testing these changes, installation is not the issue, it's the opencv's imshow() calls.
Did you test that with real hardware? In the general case, I really prefer to not cap dep versions. Could be okay for now but must not remain in the release
So I think the better solution to this issue is:
- Remove pyav completely once torchcodec is distributed more easily and we're confident it's as easy to install.
- Most importantly, remove
cv2.imshow()calls. This is used in the teleop/dataset recording use case. Instead, we should either:- build a simple flask app and stream the images there for display. It would look a bit like the current
visualize_dataset_htmlscript - build a rerun app for real-time visualization (might be prettier)
- build a simple flask app and stream the images there for display. It would look a bit like the current
I'm happy with both, as long as they fit our needs. Wanna start working on it?
To investigate the issue, I conducted tests using:
- A simple dummy app designed to reproduce the error easily:
import numpy as np import cv2 import av cv2.imshow("debug", np.zeros((128,128,3), dtype=np.uint8)) cv2.waitKey(0) - The testing scripts (which also use
imshow()).
In both cases, the process would hang, and the CPU usage would spike to 100%. However, after applying the fix, both scenarios ran smoothly without any issues.
I completely agree that having imshow() in our codebase isn’t ideal. As I mentioned in my initial proposed solutions, it’s primarily used for debugging or examples purposes. Moving forward, I’d like to transition to using OpenCV in headless mode to avoid such dependencies.
For now, I believe capping this dependency is a reasonable solution for our current needs. This approach is not only recommended by OpenCV developers (as seen in this GitHub comment), but it also aligns with the fact that these two projects operate independently. We can work towards transitioning to headless mode and removing imshow() when it becomes a priority closer to the release. However, I think this fix will provide a better experience for users today—especially since we receive a new issue related to this problem almost every week.
@Cadene What are your thoughts?
Just tested with a fresh conda environment, unfortunately cv2.imshow() still hangs for me:
Ubuntu 24.04.2
opencv-python == 4.11.0.86
av == 12.3.0
torchvision == 0.21.0
Doing it the old way
conda install -y -c conda-forge ffmpeg
pip uninstall -y opencv-python
conda install -y -c conda-forge "opencv>=4.10.0"
followed by
conda install -c conda-forge jpeg libtiff
works: the imshow window appears and python lerobot/scripts/control_robot.py --robot.type=so100 --control.type=teleoperate runs as expected.
Hello @kuz,
Thanks for reporting this! The documentation needed an update, and your report helped us spot that.
Could you do me a favor and test the following in a fresh conda environment and report if it runs as expected?
conda create -y -n lerobot python=3.10
conda activate lerobot
conda install ffmpeg
pip install --no-binary=av -e .
I’ll open a PR to get this updated 😄 Here it is: https://github.com/huggingface/lerobot/pull/907