jetson-containers
jetson-containers copied to clipboard
torchvision.ops.nms fails on GPU data inside the container nvcr.io/nvidia/l4t-pytorch:r32.4.2-pth1.3-py3, but works as expected on the host OS
Hey @dusty-nv
I've been having an issue running my object detection model within the container nvcr.io/nvidia/l4t-pytorch:r32.4.2-pth1.3-py3
. Specifically, when running inference, I call torchvision.ops.nms
in order to perform non-maximum suppression on the objects detected by the network. When doing inference in the container, this gives the following error:
File "/usr/local/lib/python3.6/dist-packages/torchvision-0.4.2-py3.6-linux-aarch64.egg/torchvision/ops/boxes.py", line 33, in nms
RuntimeError: CUDA error: no kernel image is available for execution on the device (nms_cuda at /torchvision/torchvision/csrc/cuda/nms_cuda.cu:127)
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x78 (0x7f5e7378d8 in /usr/local/lib/python3.6/dist-packages/torch/lib/libc10.so)
frame #1: nms_cuda(at::Tensor const&, at::Tensor const&, float) + 0x710 (0x7f3e0eb51c in /root/.cache/Python-Eggs/torchvision-0.4.2-py3.6-linux-aarch64.egg-tmp/torchvision/_C.so)
frame #2: nms(at::Tensor const&, at::Tensor const&, float) + 0x114 (0x7f3e08ae7c in /root/.cache/Python-Eggs/torchvision-0.4.2-py3.6-linux-aarch64.egg-tmp/torchvision/_C.so)
frame #3: <unknown function> + 0x73b70 (0x7f3e0bab70 in /root/.cache/Python-Eggs/torchvision-0.4.2-py3.6-linux-aarch64.egg-tmp/torchvision/_C.so)
frame #4: <unknown function> + 0x70248 (0x7f3e0b7248 in /root/.cache/Python-Eggs/torchvision-0.4.2-py3.6-linux-aarch64.egg-tmp/torchvision/_C.so)
frame #5: <unknown function> + 0x69718 (0x7f3e0b0718 in /root/.cache/Python-Eggs/torchvision-0.4.2-py3.6-linux-aarch64.egg-tmp/torchvision/_C.so)
frame #6: <unknown function> + 0x699e4 (0x7f3e0b09e4 in /root/.cache/Python-Eggs/torchvision-0.4.2-py3.6-linux-aarch64.egg-tmp/torchvision/_C.so)
frame #7: <unknown function> + 0x534a4 (0x7f3e09a4a4 in /root/.cache/Python-Eggs/torchvision-0.4.2-py3.6-linux-aarch64.egg-tmp/torchvision/_C.so)
<omitting python frames>
frame #9: python3() [0x529958]
frame #11: python3() [0x527860]
frame #12: python3() [0x5297dc]
frame #14: python3() [0x528ff0]
frame #17: python3() [0x5f2bcc]
frame #20: python3() [0x528ff0]
frame #23: python3() [0x5f2bcc]
frame #25: python3() [0x595e5c]
frame #28: python3() [0x528ff0]
frame #31: python3() [0x5f2bcc]
frame #34: python3() [0x528ff0]
frame #37: python3() [0x5f2bcc]
frame #39: python3() [0x595e5c]
frame #41: python3() [0x529738]
frame #43: python3() [0x527860]
frame #44: python3() [0x5297dc]
frame #46: python3() [0x528ff0]
frame #51: __libc_start_main + 0xe0 (0x7f9d2256e0 in /lib/aarch64-linux-gnu/libc.so.6)
frame #52: python3() [0x420e94]
Segmentation fault (core dumped)
To simplify the debugging process, I've come up with a minimal program that gives the same error as above:
import torch
import torchvision
bboxes = [[0.0, 0.0, 2.0, 2.0], [0.75, 0.75, 1.0, 1.0]]
scores = torch.tensor([1., 0.5]).cuda()
boxes = torch.tensor(bboxes).cuda()
keep = torchvision.ops.nms(boxes, scores, 0.7)
print(keep)
When running this code from within the container, I get essentially the same error message:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.6/dist-packages/torchvision-0.4.2-py3.6-linux-aarch64.egg/torchvision/ops/boxes.py", line 33, in nms
RuntimeError: CUDA error: no kernel image is available for execution on the device (nms_cuda at /torchvision/torchvision/csrc/cuda/nms_cuda.cu:127)
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x78 (0x7f8adb98d8 in /usr/local/lib/python3.6/dist-packages/torch/lib/libc10.so)
frame #1: nms_cuda(at::Tensor const&, at::Tensor const&, float) + 0x710 (0x7f6541151c in /root/.cache/Python-Eggs/torchvision-0.4.2-py3.6-linux-aarch64.egg-tmp/torchvision/_C.so)
frame #2: nms(at::Tensor const&, at::Tensor const&, float) + 0x114 (0x7f653b0e7c in /root/.cache/Python-Eggs/torchvision-0.4.2-py3.6-linux-aarch64.egg-tmp/torchvision/_C.so)
frame #3: <unknown function> + 0x73b70 (0x7f653e0b70 in /root/.cache/Python-Eggs/torchvision-0.4.2-py3.6-linux-aarch64.egg-tmp/torchvision/_C.so)
frame #4: <unknown function> + 0x70248 (0x7f653dd248 in /root/.cache/Python-Eggs/torchvision-0.4.2-py3.6-linux-aarch64.egg-tmp/torchvision/_C.so)
frame #5: <unknown function> + 0x69718 (0x7f653d6718 in /root/.cache/Python-Eggs/torchvision-0.4.2-py3.6-linux-aarch64.egg-tmp/torchvision/_C.so)
frame #6: <unknown function> + 0x699e4 (0x7f653d69e4 in /root/.cache/Python-Eggs/torchvision-0.4.2-py3.6-linux-aarch64.egg-tmp/torchvision/_C.so)
frame #7: <unknown function> + 0x534a4 (0x7f653c04a4 in /root/.cache/Python-Eggs/torchvision-0.4.2-py3.6-linux-aarch64.egg-tmp/torchvision/_C.so)
<omitting python frames>
frame #9: python3() [0x529958]
frame #11: python3() [0x527860]
frame #12: python3() [0x5297dc]
frame #14: python3() [0x528ff0]
frame #15: python3() [0x63075c]
frame #20: __libc_start_main + 0xe0 (0x7fb7aa66e0 in /lib/aarch64-linux-gnu/libc.so.6)
frame #21: python3() [0x420e94]
However, when I run this on the host OS, there are no errors. Here is the output of running jetson_release
on that device (note that it has torch 1.3 and torchvision 0.4.2 installed as well):
- NVIDIA Jetson Nano (Developer Kit Version)
* Jetpack 4.4 DP [L4T 32.4.2]
* NV Power Mode: MAXN - Type: 0
* jetson_clocks service: inactive
- Libraries:
* CUDA: 10.2.89
* cuDNN: 8.0.0.145
* TensorRT: 7.1.0.16
* Visionworks: 1.6.0.501
* OpenCV: 4.1.1 compiled CUDA: NO
* VPI: 0.2.0
* Vulkan: 1.2.70
And the output of running the minimum program is tensor([0, 1], device='cuda:0')
. Do you know why this program fails to run from within the container?
Hmm it may be because torchvision was compiled and detected the GPU arch I built it on (Xavier), instead of the arch's that I built PyTorch with (Nano, TX2, Xavier). I will have to investigate how to force other GPU arch's in torchvision.
If you re-build the pytorch container on your Jetson, I think it should work. You can comment out the containers other than pytorch in scripts/docker_build_all.sh and it will build faster.
From: astekardis [email protected] Sent: Wednesday, May 27, 2020 5:11:39 PM To: dusty-nv/jetson-containers [email protected] Cc: Dustin Franklin [email protected]; Mention [email protected] Subject: [dusty-nv/jetson-containers] torchvision.ops.nms fails on GPU data inside the container nvcr.io/nvidia/l4t-pytorch:r32.4.2-pth1.3-py3, but works as expected on the host OS (#7)
Hey @dusty-nvhttps://github.com/dusty-nv
I've been having an issue running my object detection model within the container nvcr.io/nvidia/l4t-pytorch:r32.4.2-pth1.3-py3. Specifically, when running inference, I call torchvision.ops.nms in order to perform non-maximum suppression on the objects detected by the network. When doing inference in the container, this gives the following error:
File "/usr/local/lib/python3.6/dist-packages/torchvision-0.4.2-py3.6-linux-aarch64.egg/torchvision/ops/boxes.py", line 33, in nms
RuntimeError: CUDA error: no kernel image is available for execution on the device (nms_cuda at /torchvision/torchvision/csrc/cuda/nms_cuda.cu:127)
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits
Segmentation fault (core dumped)
To simplify the debugging process, I've come up with a minimal program that gives the same error as above:
import torch import torchvision bboxes = [[0.0, 0.0, 2.0, 2.0], [0.75, 0.75, 1.0, 1.0]] scores = torch.tensor([1., 0.5]).cuda() boxes = torch.tensor(bboxes).cuda() keep = torchvision.ops.nms(boxes, scores, 0.7) print(keep)
When running this code from within the container, I get essentially the same error message:
Traceback (most recent call last):
File "
However, when I run this on the host OS, there are no errors. Here is the output of running jetson_release on that device (note that it has torch 1.3 and torchvision 0.4.2 installed as well):
- NVIDIA Jetson Nano (Developer Kit Version)
- Jetpack 4.4 DP [L4T 32.4.2]
- NV Power Mode: MAXN - Type: 0
- jetson_clocks service: inactive
- Libraries:
- CUDA: 10.2.89
- cuDNN: 8.0.0.145
- TensorRT: 7.1.0.16
- Visionworks: 1.6.0.501
- OpenCV: 4.1.1 compiled CUDA: NO
- VPI: 0.2.0
- Vulkan: 1.2.70
And the output of running the minimum program is tensor([0, 1], device='cuda:0'). Do you know why this program fails to run from within the container?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/dusty-nv/jetson-containers/issues/7, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADVEGK3GIXBW6262AK3NJCLRTV6YXANCNFSM4NMQEIMA.
Ah, interesting. I'll give that a try and post here when I'm done.
Trying to build on my Nano (running L4T 32.4.2) gives this error:
Step 13/17 : RUN git clone -b ${TORCHVISION_VERSION} https://github.com/pytorch/vision torchvision && cd torchvision && python3 setup.py install && cd ../ && rm -rf torchvision && pip3 install "${PILLOW_VERSION}"
---> Running in bf6bfc8a272f
Cloning into 'torchvision'...
Traceback (most recent call last):
File "setup.py", line 14, in <module>
import torch
File "/usr/local/lib/python3.6/dist-packages/torch/__init__.py", line 81, in <module>
from torch._C import *
ImportError: libnvToolsExt.so.1: cannot open shared object file: No such file or directory
The command '/bin/sh -c git clone -b ${TORCHVISION_VERSION} https://github.com/pytorch/vision torchvision && cd torchvision && python3 setup.py install && cd ../ && rm -rf torchvision && pip3 install "${PILLOW_VERSION}"' returned a non-zero code: 1
Have you seen this issue before while building?
Note that this is what I'm running inside the docker_build_all.sh
(the other installs are commented out):
# PyTorch v1.3.0
build_pytorch "https://nvidia.box.com/shared/static/017sci9z4a0xhtwrb4ps52frdfti9iw0.whl" \
"torch-1.3.0-cp36-cp36m-linux_aarch64.whl" \
"l4t-pytorch:r32.4.2-pth1.3-py3" \
"v0.4.2" \
"pillow<7"
@astekardis did you set your docker default-runtime
to nvidia, as shown here - https://github.com/dusty-nv/jetson-containers#docker-default-runtime
That enables the nvidia runtime to be used during docker build
operations.
Ah, I was trying to build on a device with a fresh install of JetPack 4.4 and had forgotten to do that. Thanks. I'll post when it finishes building the docker image.
Okay, the minimum program works in the image that I locally built. Should I leave this issue open while you look into the issue with the image(s) on nvcr?
I am having the same issue, while building upon nvcr.io/nvidia/l4t-base:r32.6.1, with custom built torchvision (for a jetson TX2 NX), will keep this posted.
Just want to report, that I am also facing this issue in a Jetson Orin using l4t-ml docker image [JetPack 5.0.2 (L4T R35.1.0)]
Just want to report, that I am also facing this issue in a Jetson Orin using l4t-ml docker image [JetPack 5.0.2 (L4T R35.1.0)]
@udit7395 I would recommend trying (or building) one of the updated l4t-ml or l4t-pytorch container images:
- https://github.com/dusty-nv/jetson-containers/tree/master/packages/l4t/l4t-ml
- https://github.com/dusty-nv/jetson-containers/tree/master/packages/l4t/l4t-pytorch
@dusty-nv Thanks, I am no longer facing this issue.