[Support]: Docker (0.12.0-beta2-tensorrt) exception trying to load libnvrtc.so (not found)?
Describe the problem you are having
I'm at a loss and hoping for any suggestions. Basically I'm trying to get a TensorRT detector working with blakeblackshear/frigate:0.12.0-beta2-tensorrt (Docker compose config).
I feel like my general NVIDIA configuration is OK, given:
- I was able to generate the
trt-modelsusing thetensorrt_models.shscript inside anvcr.io/nvidia/tensorrt:22.07-py3container nvidia-smiworks in the Frigate container, on the host, and in my other NVIDIA runtime containers.- ffmpeg hardware acceleration is working fine with the Frigate container using
preset-nvidia-h264and-c:v h264_cuvid - I'm running other containers which use CUDA, etc.
However, when trying to startup a TensorRT detector, I get the following:
Could not load library libcudnn_cnn_infer.so.8. Error: libnvrtc.so: cannot open shared object file: No such file or directory
Fatal Python error: Aborted
I see libnvrtc.so on both my host and inside the nvcr.io/nvidia/tensorrt:22.07-py3 and other containers, but not inside my Frigate container. So I'm perplexed as as to how I can make libnvrtc.so (from CUDA?) available in the container short of bind mounting /usr/local/cuda-11.7/targets/x86_64-linux/lib/ from the host. (having tried a variety of compose options)
Version
blakeblackshear/frigate:0.12.0-beta2-tensorrt
Frigate config file
# I'm using this simplified config to test, which runs fine when moved to CPU detector
mqtt:
host: mqtt.mydomain.com
port: 8883
client_id: frigate
topic_prefix: frigate
user: myuser
password: mypass
tls_ca_certs: /etc/ssl/certs/ca-certificates.crt
tls_insecure: false
cameras:
Front-Door:
ffmpeg:
hwaccel_args: preset-nvidia-h264
input_args:
- -c:v
- h264_cuvid
inputs:
- path: rtsp://myuser:[email protected]:10554/Streaming/Channels/202
roles:
- detect
- restream
- path: rtsp://myuser:[email protected]:10554/Streaming/Channels/201
roles:
- record
snapshots:
enabled: true
motion:
mask:
- 142,28,241,33,241,0,142,0
detect:
width: 640
height: 360
detectors:
tensorrt:
type: tensorrt
model:
path: /trt-models/yolov7-tiny-416.trt
input_tensor: nchw
input_pixel_format: rgb
width: 416
height: 416
Relevant log output
s6-rc: info: service s6rc-oneshot-runner: starting
s6-rc: info: service s6rc-oneshot-runner successfully started
s6-rc: info: service fix-attrs: starting
s6-rc: info: service fix-attrs successfully started
s6-rc: info: service legacy-cont-init: starting
cont-init: info: running /etc/cont-init.d/prepare-logs.sh
cont-init: info: /etc/cont-init.d/prepare-logs.sh exited 0
s6-rc: info: service legacy-cont-init successfully started
s6-rc: info: service legacy-services: starting
services-up: info: copying legacy longrun frigate (no readiness notification)
services-up: info: copying legacy longrun go2rtc (no readiness notification)
services-up: info: copying legacy longrun nginx (no readiness notification)
s6-rc: info: service legacy-services successfully started
2023-01-11 00:46:53.496196078 07:46:53.496 INF go2rtc version 0.1-rc.6 linux/amd64
2023-01-11 00:46:53.496959381 07:46:53.496 INF [api] listen addr=:1984
2023-01-11 00:46:53.497028236 07:46:53.497 INF [rtsp] listen addr=:8554
2023-01-11 00:46:53.497228724 07:46:53.497 INF [webrtc] listen addr=:8555
2023-01-11 00:46:53.497280472 07:46:53.497 INF [srtp] listen addr=:8443
2023-01-11 00:46:54.639356794 [2023-01-11 00:46:54] frigate.app INFO : Starting Frigate (0.12.0-0dbf909)
2023-01-11 00:46:54.661348602 [2023-01-11 00:46:54] peewee_migrate INFO : Starting migrations
2023-01-11 00:46:54.666553629 [2023-01-11 00:46:54] peewee_migrate INFO : There is nothing to migrate
2023-01-11 00:46:54.674083840 [2023-01-11 00:46:54] ws4py INFO : Using epoll
2023-01-11 00:46:54.690982397 [2023-01-11 00:46:54] detector.tensorrt INFO : Starting detection process: 970
2023-01-11 00:46:54.691723240 [2023-01-11 00:46:54] frigate.app INFO : Output process started: 972
2023-01-11 00:46:54.694029800 [2023-01-11 00:46:54] ws4py INFO : Using epoll
2023-01-11 00:46:54.695904656 [2023-01-11 00:46:54] frigate.app INFO : Camera processor started for Front-Door: 976
2023-01-11 00:46:54.699253070 [2023-01-11 00:46:54] frigate.app INFO : Capture process started for Front-Door: 978
2023-01-11 00:46:55.148182652 [2023-01-11 00:46:55] frigate.detectors.plugins.tensorrt INFO : [MemUsageChange] Init CUDA: CPU +188, GPU +0, now: CPU 241, GPU 127 (MiB)
2023-01-11 00:46:55.166258368 [2023-01-11 00:46:55] frigate.detectors.plugins.tensorrt INFO : Loaded engine size: 35 MiB
2023-01-11 00:46:55.512402191 [2023-01-11 00:46:55] frigate.detectors.plugins.tensorrt INFO : [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +192, GPU +74, now: CPU 496, GPU 241 (MiB)
2023-01-11 00:46:55.690972712 [2023-01-11 00:46:55] frigate.detectors.plugins.tensorrt INFO : [MemUsageChange] Init cuDNN: CPU +110, GPU +44, now: CPU 606, GPU 285 (MiB)
2023-01-11 00:46:55.705521956 Could not load library libcudnn_cnn_infer.so.8. Error: libnvrtc.so: cannot open shared object file: No such file or directory
2023-01-11 00:46:55.705531168 Fatal Python error: Aborted
2023-01-11 00:46:55.705543019
2023-01-11 00:46:55.705547155 Thread 0x00007f6348f9a6c0 (most recent call first):
2023-01-11 00:46:55.705553100 File "/usr/lib/python3.9/threading.py", line 312 in wait
2023-01-11 00:46:55.705558934 File "/usr/lib/python3.9/multiprocessing/queues.py", line 233 in _feed
2023-01-11 00:46:55.705603275 File "/usr/lib/python3.9/threading.py", line 892 in run
2023-01-11 00:46:55.705639906 File "/usr/lib/python3.9/threading.py", line 954 in _bootstrap_inner
2023-01-11 00:46:55.705644013 File "/usr/lib/python3.9/threading.py", line 912 in _bootstrap
2023-01-11 00:46:55.705647504
2023-01-11 00:46:55.705651546 Current thread 0x00007f634d256740 (most recent call first):
2023-01-11 00:46:55.705655880 File "/opt/frigate/frigate/detectors/plugins/tensorrt.py", line 229 in __init__
2023-01-11 00:46:55.705660139 File "/opt/frigate/frigate/detectors/__init__.py", line 24 in create_detector
2023-01-11 00:46:55.705664586 File "/opt/frigate/frigate/object_detection.py", line 52 in __init__
2023-01-11 00:46:55.705668786 File "/opt/frigate/frigate/object_detection.py", line 97 in run_detector
2023-01-11 00:46:55.705686380 File "/usr/lib/python3.9/multiprocessing/process.py", line 108 in run
2023-01-11 00:46:55.705690779 File "/usr/lib/python3.9/multiprocessing/process.py", line 315 in _bootstrap
2023-01-11 00:46:55.705695155 File "/usr/lib/python3.9/multiprocessing/popen_fork.py", line 71 in _launch
2023-01-11 00:46:55.705709406 File "/usr/lib/python3.9/multiprocessing/popen_fork.py", line 19 in __init__
2023-01-11 00:46:55.705730545 File "/usr/lib/python3.9/multiprocessing/context.py", line 277 in _Popen
2023-01-11 00:46:55.705754864 File "/usr/lib/python3.9/multiprocessing/context.py", line 224 in _Popen
2023-01-11 00:46:55.705792265 File "/usr/lib/python3.9/multiprocessing/process.py", line 121 in start
2023-01-11 00:46:55.705818600 File "/opt/frigate/frigate/object_detection.py", line 172 in start_or_restart
2023-01-11 00:46:55.705843911 File "/opt/frigate/frigate/object_detection.py", line 144 in __init__
2023-01-11 00:46:55.705868075 File "/opt/frigate/frigate/app.py", line 214 in start_detectors
2023-01-11 00:46:55.705889471 File "/opt/frigate/frigate/app.py", line 364 in start
2023-01-11 00:46:55.705908039 File "/opt/frigate/frigate/__main__.py", line 16 in <module>
2023-01-11 00:46:55.705937887 File "/usr/lib/python3.9/runpy.py", line 87 in _run_code
2023-01-11 00:46:55.705984158 File "/usr/lib/python3.9/runpy.py", line 197 in _run_module_as_main
2023-01-11 00:47:15.027433642 [2023-01-11 00:47:15] frigate.watchdog INFO : Detection appears to have stopped. Exiting frigate...
s6-rc: info: service legacy-services: stopping
2023-01-11 00:47:15.034035211 exit OK
2023-01-11 00:47:15.034394785 [2023-01-11 00:47:15] frigate.app INFO : Stopping...
2023-01-11 00:47:15.035051550 [2023-01-11 00:47:15] ws4py INFO : Closing all websockets with [1001] 'Server is shutting down'
2023-01-11 00:47:15.035056307 [2023-01-11 00:47:15] frigate.storage INFO : Exiting storage maintainer...
2023-01-11 00:47:15.037505849 [2023-01-11 00:47:15] frigate.events INFO : Exiting event cleanup...
2023-01-11 00:47:15.038340104 [2023-01-11 00:47:15] frigate.record INFO : Exiting recording cleanup...
2023-01-11 00:47:15.038345550 [2023-01-11 00:47:15] frigate.stats INFO : Exiting watchdog...
2023-01-11 00:47:15.038360928 [2023-01-11 00:47:15] frigate.record INFO : Exiting recording maintenance...
2023-01-11 00:47:15.038635641 [2023-01-11 00:47:15] frigate.watchdog INFO : Exiting watchdog...
2023-01-11 00:47:15.038826899 [2023-01-11 00:47:15] frigate.events INFO : Exiting event processor...
s6-svwait: fatal: supervisor died
s6-rc: info: service legacy-services successfully stopped
s6-rc: info: service legacy-cont-init: stopping
s6-rc: info: service legacy-cont-init successfully stopped
s6-rc: info: service fix-attrs: stopping
s6-rc: info: service fix-attrs successfully stopped
s6-rc: info: service s6rc-oneshot-runner: stopping
s6-rc: info: service s6rc-oneshot-runner successfully stopped
FFprobe output from your camera
N/A
Frigate stats
N/A
Operating system
Debian
Install method
Docker Compose
Coral version
Other
Network connection
Wired
Camera make and model
N/A
Any other information that may be helpful
nvidia-smi inside the container (ffmpeg process doesn't show, but does on host nvidia-smi and nvtop):
root@frigate:/opt/frigate# nvidia-smi
Wed Jan 11 00:53:49 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01 Driver Version: 515.65.01 CUDA Version: 11.7 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Quadro P2000 Off | 00000000:51:00.0 Off | N/A |
| 52% 45C P0 16W / 75W | 74MiB / 5120MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
Looking for libs in Frigate container:
root@frigate:/opt/frigate# ldconfig -p |grep libcudnn_cnn_infer
<null>
root@frigate:/opt/frigate# ldconfig -p |grep libnvrtc
<null>
root@frigate:/opt/frigate# find / -name libcudnn_cnn_infer* -print
/usr/local/lib/python3.9/dist-packages/nvidia/cudnn/lib/libcudnn_cnn_infer.so.8
root@frigate:/opt/frigate# find / -name libnvrtc* -print
<null>
Looking for libs inside nvcr.io/nvidia/tensorrt:22.07-py3 used to generate /trt-models:
root@docker:/ # docker run -it --rm nvcr.io/nvidia/tensorrt:22.07-py3 sh -c 'ldconfig -p |grep libcudnn_cnn_infer'
libcudnn_cnn_infer.so.8 (libc6,x86-64) => /lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8
libcudnn_cnn_infer.so (libc6,x86-64) => /lib/x86_64-linux-gnu/libcudnn_cnn_infer.so
root@docker:/ # docker run -it --rm nvcr.io/nvidia/tensorrt:22.07-py3 sh -c 'ldconfig -p |grep libnvrtc'
libnvrtc.so.11.2 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnvrtc.so.11.2
libnvrtc.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnvrtc.so
libnvrtc-builtins.so.11.7 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnvrtc-builtins.so.11.7
libnvrtc-builtins.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnvrtc-builtins.so
Docker compose file (several other variations tried with same result):
version: "3.7"
services:
frigate:
container_name: frigate
hostname: frigate
image: blakeblackshear/frigate:0.12.0-beta2-tensorrt
privileged: true
runtime: nvidia
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
shm_size: "256mb"
volumes:
- /etc/localtime:/etc/localtime:ro
- /storage/docker/frigate/config.yml:/config/config.yml:ro
- /storage/docker/frigate/storage:/media/frigate
- /storage/docker/frigate/trt-models:/trt-models
- type: tmpfs
target: /tmp/cache
tmpfs:
size: 1000000000
ports:
- "127.0.0.1:9049:5000"
environment:
FRIGATE_RTSP_PASSWORD: "somepassword"
NVIDIA_VISIBLE_DEVICES: all
NVIDIA_DRIVER_CAPABILITIES: compute,utility,video
restart: unless-stopped
Thanks in advance for ANY ideas! 👍
Hi, first: thanks for all this work! Looking forward having GPU detection and restream. I very much appreciate your works!
I do have the same issue as Codelica. Without GPU for detection, it works out of the box. Also using GPU version of ffmpeg is working fine. Model generation without any issues. Only getting the whole thing together breaks on the error above.
Cc @NateMeyer
@damsport11 What GPU are you using?
Yes, @damsport11 can you clarify what GPU and Host OS you're running?
I think we saw some users on UnRaid have this issue, related to the drivers that were installed on the Host.
I'll look into what could be going on, but my first hunch is it's an issue between the container and the host driver.
System is ubuntu 20.04 GPU is RTX 2060 12 GB NVIDIA-SMI 525.60.11 CUDA V11.6.124 CuDNN 8.4.1 thanks for the help ;-)
FWIW, I bind mounted /usr/local/cuda-11.7/targets/x86_64-linux/lib/libnvrtc.so.11.7.99 from the host side to /usr/local/lib/python3.9/dist-packages/nvidia/cudnn/lib/libnvrtc.so in the container and everything came to life with detections working, etc. Just not sure if that if that should be magically getting passed in via some more official mechanism. :)
yes i can confirm this is a workaround ;-)
Can you see if you are able to update the CUDA libraries on your host?
I could give it a shot tonight, but what should I target? I'm at 11.7.0 currently so there is 11.7.1, 11.8.0 or 12.0.0.
I believe the image is pulling in 11.7.1 libraries, so I would expect 11.7.1 to work. The 11.x drivers are supposed to be backwards compatible, so installing 11.8 shouldn't hurt. I've done my testing with 12.0 installed, and it worked fine with the 11.7.1 runtime libraries in the image.
So I would expect any of them to work. Would you mind stepping through them? If 11.7.1 works, we could add that to the documentation as a minimum version needed. I'll see if I can do similar testing this weekend.
I have the same issue even with 12.0 installed on my host. Initially I didnt have the nvrtc libs installed on my host, but even after installing it only the above mentioned workaround with bind mounting libnvrtc.so.12 into the container worked for me.
I can try some other versions this weekend, although I'm pretty doubtful the NVIDIA Docker runtime will pass libnvrtc.so from the host. I'm definitely not an expert, but that seems to go beyond what the NVIDIA toolkit passes in via the Docker runtime. I took a look at a couple other containers I run which seem to make use of NVRTC and both have it packaged in the image (via installation of a libnvrtc package).
It looks like Frigate is bringing in the NVIDIA resources via Python, so my wild ass Friday guess is that adding nvidia-cuda-nvrtc-cu11 to requirements-tensorrt.txt may do the trick.
ok, thanks that is helpful. I'll try to recreate this this weekend. Have you also tried with the beta3 image?
I am seeing the issue with beta3. I think @Codelica is right on.
I'm on beta3 at this point also, which acts the same. Was hoping it might resolve a couple little items I've been trying to track down but am not confident enough to call bugs yet :)
I am having the same issue with beta3 on Debian 11, installing libnvrtc11.2 and mounting the object mentioned above worked too.
- /usr/lib/x86_64-linux-gnu/libnvrtc.so.11.2:/usr/local/lib/python3.9/dist-packages/nvidia/cudnn/lib/libnvrtc.so
OK. I've added the nvrtc and reworked the library loading a little bit. Can you please try this test image to see if it resolves the issue?
ghcr.io/natemeyer/frigate:0.12.0-0baae65-tensorrt
@BBaoVanC Which GPU are you using?
I'm wondering if there is something with how the model is generated, or certain GPU arch optimizations that rely on nvrtc when others don't? Or more likely the libcudnn_cnn_infer.so is only used in certain scenarios, which happens to depend on nvrtc.
My GPU is a GTX 1050 Driver Version: 525.60.13 CUDA Version: 12.0 I am using the yolov7-tiny-416 model.
@NateMeyer I just tried with your image and still get the error:
2023-01-14 09:41:22.576908660 Could not load library libcudnn_cnn_infer.so.8. Error: libnvrtc.so: cannot open shared object file: No such file or directory
2023-01-14 09:41:22.576911440 Fatal Python error: Aborted
Thanks @dennispg I'll keep looking. I'm able to force loading of the libcudnn_cnn_infer.so.8 on my side, and it doesn't complain about libnvrtc.
Aha! I've recreated this issue by regenerating the models.
Running the yolov4-tiny-416 model instead of yolov7 does not complain.
I included which model I am using because I had a feeling it might be something like that.. glad you were able to pinpoint it!
New test image with symlink: ghcr.io/natemeyer/frigate:0.12.0-9c641ec-tensorrt
ghcr.io/natemeyer/frigate:0.12.0-9c641ec-tensorrt is working fine for me without the external bind mount. 👍
Awesome! Thanks so much for helping troubleshoot this.