run_dev.sh failing
I'm trying to use isaac_ros_common
Upon executing ./run_dev.sh I get the follwing error:
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: could not apply required modification to OCI specification: error modifying OCI spec: failed to inject CDI devices: unresolvable CDI devices nvidia.com/gpu=all, nvidia.com/pva=all: unknown.
Here is the full, verbose, output:
$ ./run_dev.sh -v
Launching Isaac ROS Dev container with image key aarch64.ros2_humble: /home/nvidia/workspaces/isaac_ros-dev/
Building aarch64.ros2_humble base as image: isaac_ros_dev-aarch64
Building layered image for key aarch64.ros2_humble as isaac_ros_dev-aarch64
Using configured docker search paths: /home/nvidia/workspaces/isaac_ros-dev/src/isaac_ros_common/scripts/../docker
Checking if base image nvcr.io/nvidia/isaac/ros:aarch64-ros2_humble_deaea1a392d5c02f76be3f4651f4b65a exists on remote registry
Found pre-built base image: nvcr.io/nvidia/isaac/ros:aarch64-ros2_humble_deaea1a392d5c02f76be3f4651f4b65a
aarch64-ros2_humble_deaea1a392d5c02f76be3f4651f4b65a: Pulling from nvidia/isaac/ros
Digest: sha256:69b1a8b4373fce2a57ab656cd7c7e2a714f685cfd62168418caeaa216d4315a0
Status: Image is up to date for nvcr.io/nvidia/isaac/ros:aarch64-ros2_humble_deaea1a392d5c02f76be3f4651f4b65a
nvcr.io/nvidia/isaac/ros:aarch64-ros2_humble_deaea1a392d5c02f76be3f4651f4b65a
Finished pulling pre-built base image: nvcr.io/nvidia/isaac/ros:aarch64-ros2_humble_deaea1a392d5c02f76be3f4651f4b65a
Nothing to build, retagged nvcr.io/nvidia/isaac/ros:aarch64-ros2_humble_deaea1a392d5c02f76be3f4651f4b65a as isaac_ros_dev-aarch64
Running isaac_ros_dev-aarch64-container
+ docker run -it --rm --privileged --network host --ipc=host -v /tmp/.X11-unix:/tmp/.X11-unix -v /home/nvidia/.Xauthority:/home/admin/.Xauthority:rw -e DISPLAY -e NVIDIA_VISIBLE_DEVICES=all -e NVIDIA_DRIVER_CAPABILITIES=all -e ROS_DOMAIN_ID -e USER -e ISAAC_ROS_WS=/workspaces/isaac_ros-dev -e HOST_USER_UID=1000 -e HOST_USER_GID=1000 -e NVIDIA_VISIBLE_DEVICES=nvidia.com/gpu=all,nvidia.com/pva=all -v /usr/bin/tegrastats:/usr/bin/tegrastats -v /tmp/:/tmp/ -v /usr/lib/aarch64-linux-gnu/tegra:/usr/lib/aarch64-linux-gnu/tegra -v /usr/src/jetson_multimedia_api:/usr/src/jetson_multimedia_api --pid=host -v /usr/share/vpi3:/usr/share/vpi3 -v /dev/input:/dev/input -v /run/jtop.sock:/run/jtop.sock:ro -v /home/nvidia/workspaces/isaac_ros-dev/:/workspaces/isaac_ros-dev -v /etc/localtime:/etc/localtime:ro --name isaac_ros_dev-aarch64-container --runtime nvidia --entrypoint /usr/local/bin/scripts/workspace-entrypoint.sh --workdir /workspaces/isaac_ros-dev isaac_ros_dev-aarch64 /bin/bash
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: could not apply required modification to OCI specification: error modifying OCI spec: failed to inject CDI devices: unresolvable CDI devices nvidia.com/gpu=all, nvidia.com/pva=all: unknown.
+ cleanup
+ for command in "${ON_EXIT[@]}"
+ popd
~/workspaces/isaac_ros-dev/src/isaac_ros_common/scripts
Setup:
- Hardware: Jetson Orin Nano Dev Kit
- Software: Jetpack 6.1
Thanks for any help with this issue
Thanks for raising this. We had seen this on pre-release builds of Jetpack 6..0 and VPI 3.2.1 which required running the pre-requisite steps listed here. These steps should not have been necessary with a Jetpack 6.1 machine, however, and we had not seen it since. Could you run these steps on your Jetson and let us know if that resolves it for you? If so, we'll update the instructions and the troubleshooting while we try to reproduce the issue on our end here too.
Thanks for your feedback. I was not successful in running the steps in the provided link. Here is what I have done so far:
Setting up the Jetson from scratch
Since my initial post, I reinstalled Jetpack. I did this because I was afraid that some other installations I had done on the system could somehow be interfering with building the image. I have done the following:
- I reinstalled Jetpack 6.1 (rev. 1) using the SDK Manager. I installed Jetpack directly on the SSD on the Jetson.
- After installation I installed nvidia-jetpack using apt:
sudo apt install nvidia-jetpack - Running
jtopprovides the following information about my installation- Libraries
- CUDA: 12.6.68
- cuDNN: 9.3.0.75
- TensorRT: 10.3.0.30
- VPI: 3.2.4
- Vulkan: 1.3.204
- OpenCV: 4.8.0 with CUDA: NO
- Hardware
- Model: NVIDIA Jetson Orin Nano Developer Kit
- Module: NVIDIA Jetson Orin Nano (Developer Kit)
- L4T: 36.4.2
- Jetpack: MISSING
- I belive the fact that it says "MISSING" is related to the following warning when I execute jtop
[WARN] jetson-stats not supported for [L4T 36.4.2]. - Running
dpkg -l | grep nvidia-jetpackyieldsii nvidia-jetpack 6.1+b123 arm64 NVIDIA Jetpack Meta Packageii nvidia-jetpack-dev 6.1+b123 arm64 NVIDIA Jetpack dev Meta Packageii nvidia-jetpack-runtime 6.1+b123 arm64 NVIDIA Jetpack runtime Meta Package
- I belive the fact that it says "MISSING" is related to the following warning when I execute jtop
- Libraries
These are the only steps I have done in terms of setting up the system which are not directly related to Isaac
Isaac setup
I set up Isaac according to this: https://nvidia-isaac-ros.github.io/getting_started/dev_env_setup.html .
I skipped parts of Step 1, related to how to move docker over to the SSD, since my installation already is on the SSD. However, after my first failure of building ros2-isaac-common on the new JetPack installation I have since added "default-runtime": "nvidia", to the daemon.json file. Also, as for testing docker SSD I was not able to pull docker pull nvcr.io/nvidia/l4t-base:r36.4.2. This might be expected, I just wanted to try this version since it matches my current version of L4T. However, pulling docker pull nvcr.io/nvidia/l4t-base:r35.2.1 as described in these docs worked.
I followed step 2 - 4 and sat up the workspace under ~/workspaces/isaac_ros-dev/src since my installation is on a SSD and not a SD card.
isaac_ros_common
I cloned isaac_ros_common repo into ~/workspaces/isaac_ros-dev/src/ with the following command: git clone -b release-3.2 https://github.com/NVIDIA-ISAAC-ROS/isaac_ros_common.git isaac_ros_common. Right afterwards I ran run_dev.sh.
This yielded the following error:
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: could not apply required modification to OCI specification: error modifying OCI spec: failed to inject CDI devices: unresolvable CDI devices nvidia.com/gpu=all, nvidia.com/pva=all: unknown.
Running pre-requisite steps:
nvidia-smi returns the following:
$ nvidia-smi
Tue Dec 17 14:41:35 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 540.4.0 Driver Version: 540.4.0 CUDA Version: 12.6 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 Orin (nvgpu) N/A | N/A N/A | N/A |
| N/A N/A N/A N/A / N/A | Not Supported | N/A N/A |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
According to this it seems like the CUDA driver is recognized, considering that CUDA Version says 12.6. I found it worrisome that no stats are available for the GPU. However, according to some googling, it seems like this is to be expected on Jetson devices?
Checking nvidia-container yields:
$ nvidia-container-toolkit --version
NVIDIA Container Runtime Hook version 1.14.2
commit: 807c87e057e13fbd559369b8fd722cc7a6f4e5bb
To me, this looks good. However, running nvidia-ctk gives:
$ sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
INFO[0000] Auto-detected mode as "nvml"
ERRO[0000] failed to generate CDI spec: failed to create device CDI specs: failed to generate CDI edits for GPU devices: error visiting device: failed to get edits for device: failed to create device discoverer: error getting GPU device minor number: Not Supported
Since this did not work I did check whether a yaml file already existed for cdi and found the following:
$ cat /etc/cdi/nvidia-pva.yaml
---
cdiVersion: 0.5.0
containerEdits:
mounts:
- containerPath: /run/nvidia-pva-allowd
hostPath: /run/nvidia-pva-allowd
options:
- ro
- nosuid
- nodev
- bind
hooks:
- path: /usr/bin/nvidia-pva-hook
hookName: createContainer
args:
- nvidia-pva-hook
- -d
- /etc/pva/allow.d
- create
- path: /usr/bin/nvidia-pva-allow
hookName: createContainer
args:
- nvidia-pva-allow
- update
- path: /usr/bin/nvidia-pva-hook
hookName: poststop
args:
- nvidia-pva-hook
- -d
- /etc/pva/allow.d
- remove
- path: /usr/bin/nvidia-pva-allow
hookName: poststop
args:
- nvidia-pva-allow
- update
devices:
- name: "0"
containerEdits:
env:
- NVIDIA_PVA_DEVICE=0
- name: all
containerEdits:
env:
- NVIDIA_PVA_DEVICE=all
kind: nvidia.com/pva
I therefore proceeded with the next step with the hope that previous yaml file would be sufficient, however, I was met with another error:
$ sudo nvidia-ctk runtime configure --runtime=docker --cdi.enabled=true
Incorrect Usage: flag provided but not defined: -cdi.enabled
NAME:
NVIDIA Container Toolkit CLI runtime configure - Add a runtime to the specified container engine
USAGE:
NVIDIA Container Toolkit CLI runtime configure [command options] [arguments...]
OPTIONS:
--dry-run update the runtime configuration as required but don't write changes to disk (default: false)
--runtime value the target runtime engine; one of [containerd, crio, docker] (default: "docker")
--config value path to the config file for the target runtime
--config-mode value the config mode for runtimes that support multiple configuration mechanisms
--oci-hook-path value the path to the OCI runtime hook to create if --config-mode=oci-hook is specified. If no path is specified, the generated hook is output to STDOUT.
Note: The use of OCI hooks is deprecated.
--nvidia-runtime-name value specify the name of the NVIDIA runtime that will be added (default: "nvidia")
--nvidia-runtime-path value, --runtime-path value specify the path to the NVIDIA runtime executable (default: "nvidia-container-runtime")
--nvidia-runtime-hook-path value specify the path to the NVIDIA Container Runtime hook executable (default: "/usr/bin/nvidia-container-runtime-hook")
--nvidia-set-as-default, --set-as-default set the NVIDIA runtime as the default runtime (default: false)
--help, -h show help (default: false)
ERRO[0000] flag provided but not defined: -cdi.enabled
Conclusion
I was not able to fix the issue using the steps from your previous comment. Is there something else that I'm missing? Thank you for your assistance
I was able to start the container after commenting out the below line in run_dev.sh (line 243) as a temporary workaround.
DOCKER_ARGS+=("-e NVIDIA_VISIBLE_DEVICES=nvidia.com/gpu=all,nvidia.com/pva=all")
+1, I'm experiencing exactly same steps and same issue w/ @Flipsack
Setup: Hardware: Jetson Orin AGX Dev Kit 64GB Software: Jetpack 6.1
I was able to build the image when commenting out the line that @mickey13 was suggesting
Able to replicate.
# R36 (release), REVISION: 4.0, GCID: 37537400, BOARD: generic, EABI: aarch64, DATE: Fri Sep 13 04:36:44 UTC 2024
# KERNEL_VARIANT: oot
TARGET_USERSPACE_LIB_DIR=nvidia
TARGET_USERSPACE_LIB_DIR_PATH=usr/lib/aarch64-linux-gnu/nvidia
nvidia@nvidia-desktop:/mnt/nova_ssd/workspaces/isaac_ros-dev/src/isaac_ros_common$ cd ${ISAAC_ROS_WS}/src/isaac_ros_common && ./scripts/run_dev.sh
Launching Isaac ROS Dev container with image key aarch64.ros2_humble: /mnt/nova_ssd/workspaces/isaac_ros-dev/
Building aarch64.ros2_humble base as image: isaac_ros_dev-aarch64
Building layered image for key aarch64.ros2_humble as isaac_ros_dev-aarch64
Using configured docker search paths: /mnt/nova_ssd/workspaces/isaac_ros-dev/src/isaac_ros_common/scripts/../docker
Checking if base image nvcr.io/nvidia/isaac/ros:aarch64-ros2_humble_deaea1a392d5c02f76be3f4651f4b65a exists on remote registry
Found pre-built base image: nvcr.io/nvidia/isaac/ros:aarch64-ros2_humble_deaea1a392d5c02f76be3f4651f4b65a
aarch64-ros2_humble_deaea1a392d5c02f76be3f4651f4b65a: Pulling from nvidia/isaac/ros
Digest: sha256:69b1a8b4373fce2a57ab656cd7c7e2a714f685cfd62168418caeaa216d4315a0
Status: Image is up to date for nvcr.io/nvidia/isaac/ros:aarch64-ros2_humble_deaea1a392d5c02f76be3f4651f4b65a
nvcr.io/nvidia/isaac/ros:aarch64-ros2_humble_deaea1a392d5c02f76be3f4651f4b65a
Finished pulling pre-built base image: nvcr.io/nvidia/isaac/ros:aarch64-ros2_humble_deaea1a392d5c02f76be3f4651f4b65a
Nothing to build, retagged nvcr.io/nvidia/isaac/ros:aarch64-ros2_humble_deaea1a392d5c02f76be3f4651f4b65a as isaac_ros_dev-aarch64
Running isaac_ros_dev-aarch64-container
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: could not apply required modification to OCI specification: error modifying OCI spec: failed to inject CDI devices: unresolvable CDI devices nvidia.com/gpu=all, nvidia.com/pva=all: unknown.
/mnt/nova_ssd/workspaces/isaac_ros-dev/src/isaac_ros_common
Able to fix by removing CDI by following @mickey13
@Flipsack I followed your steps as https://github.com/NVIDIA-ISAAC-ROS/isaac_ros_common/issues/163#issuecomment-2548528534 and encoutered the same errors.
To me, this looks good. However, running nvidia-ctk gives: $ sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml INFO[0000] Auto-detected mode as "nvml"
ERRO[0000] failed to generate CDI spec: failed to create device CDI specs: failed to generate CDI edits for GPU devices: error visiting device: failed to get edits for device: failed to create device discoverer: error getting GPU device minor number: Not Supported
This issue was reported and solved here: https://forums.developer.nvidia.com/t/podman-gpu-on-jetson-agx-orin/297734/10?u=development7. The fix is to force csv format:
sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml --mode=csv
Is still got the same error:
$ sudo nvidia-ctk runtime configure --runtime=docker --cdi.enabled=true
Incorrect Usage: flag provided but not defined: -cdi.enabled
on the next step but this time ./run_dev.sh worked
This may be because of an older version of NVIDIA Container Toolkit (see here on how to update to at least 1.16). It is possible the JetPack upgrade from 6.0 to 6.1 did not update the NCT for you I suppose. That should resolve this without any workarounds and keep PVA accessible within the dev container as intended.
Alternatively, I had run into the same issues listed here on pre-release JP6.0 and was able to get things mostly working on NVIDIA Container Toolkit 1.14 (could not upgrade to 1.16 because the update list was too extensive) using the --mode=csv workaround AND adding the following to my /etc/docker/daemon.json (restart docker daemon):
{ "features": { "cdi": true }, "cdi-spec-dirs": ["/etc/cdi/", "/var/run/cdi"] }
Same issue here on Jetson Orin Nano Devkit.
@hemalshahNV I was unable to install anything newer than NCT 14.2 on my Jetson following your link, even after configuring experimental packages.
@beniaminopozzan After following your fix ./run_dev.sh worked for me as well.
I'm also unable to update NCT above 14.2 on a Jetson Orin NX flashed with JP6.1(rev1)