openpi
openpi copied to clipboard
When I ran train.py in OpenPI, I got the following error:
Environment: Ubuntu 24.04, RTX5090, CUDA 12.6.3
When I run
XLA_PYTHON_CLIENT_MEM_FRACTION=0.9 uv run scripts/train.py pi0_fast_libero --exp-name=my_experiment --overwrite
I get the following error:
XLA_PYTHON_CLIENT_MEM_FRACTION=0.9 uv run scripts/train.py pi0_fast_libero --exp-name=my_experiment --overwrite warning: The tool.uv.dev-dependencies field (used in packages/openpi-client/pyproject.toml) is deprecated and will be removed in a future release; use dependency-groups.dev instead /home/openpi/.venv/lib/python3.11/site-packages/tyro/_parsers.py:332: UserWarning: The field model.action-expert-variant is annotated with type typing.Literal['dummy', 'gemma_300m', 'gemma_2b', 'gemma_2b_lora'], but the default value gemma_300m_lora has type <class 'str'>. We'll try to handle this gracefully, but it may cause unexpected behavior. warnings.warn(message) 11:53:51.620 [I] Running on: su-station-05 (20721:train.py:195) INFO:2025-09-20 11:53:51,899:jax._src.xla_bridge:945: Unable to initialize backend 'rocm': module 'jaxlib.xla_extension' has no attribute 'GpuAllocatorConfig' 11:53:51.899 [I] Unable to initialize backend 'rocm': module 'jaxlib.xla_extension' has no attribute 'GpuAllocatorConfig' (20721:xla_bridge.py:945) INFO:2025-09-20 11:53:51,900:jax._src.xla_bridge:945: Unable to initialize backend 'tpu': INTERNAL: Failed to open libtpu.so: libtpu.so: cannot open shared object file: No such file or directory 11:53:51.900 [I] Unable to initialize backend 'tpu': INTERNAL: Failed to open libtpu.so: libtpu.so: cannot open shared object file: No such file or directory (20721:xla_bridge.py:945) 2025-09-20 11:53:52.207761: W external/xla/xla/stream_executor/cuda/subprocess_compilation.cc:237] Falling back to the CUDA driver for PTX compilation; ptxas does not support CC 12.0 2025-09-20 11:53:52.207773: W external/xla/xla/stream_executor/cuda/subprocess_compilation.cc:240] Used ptxas at /usr/local/cuda/bin/ptxas Traceback (most recent call last): File "/home/openpi/scripts/train.py", line 273, in
I updated ptxas to the latest version, but the same error still occurs. I am using Docker, and the PATH is not a problem. I would appreciate it if anyone knows how to solve this.