tensorflow-yolov4-tflite
tensorflow-yolov4-tflite copied to clipboard
Aborted Error while running detect.py with yolov4-tiny-trt model in Jetson Nano
Hi @hunglc007 , I have converted the yolov4-tiny weights into tensorflow weights and then converted it into tensorrt model using your repo. Now when trying to run detect.py file with the compressed tensorrt model,on my system its working fine.But on Jetson Nano, the same file with same code is not working. Its getting aborted. I have converted the weights into tf on nano itself . Below is the error:
2020-09-28 19:38:17.401081: I tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc:1205] Loaded TensorRT version: 7.1.3
2020-09-28 19:38:17.445402: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libnvinfer_plugin.so.7
2020-09-28 19:41:16.717255: F tensorflow/core/kernels/resize_bilinear_op_gpu.cu.cc:493] Non-OK-status: GpuLaunchKernel(kernel, config.block_count, config.thread_per_block, 0, d.stream(), config.virtual_thread_count, images.data(), height_scale, width_scale, batch, in_height, in_width, channels, out_height, out_width, output.data()) status: Internal: too many resources requested for launch
Fatal Python error: Aborted
Thread 0x0000007f996b9010 (most recent call first):
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/execute.py", line 60 in quick_execute
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 598 in call
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 1746 in _call_flat
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/saved_model/load.py", line 101 in _call_flat
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 1645 in _call_impl
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 1605 in __call__
File "detect.py", line 66 in main
File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 250 in _run_main
File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 299 in run
File "detect.py", line 90 in <module>
Aborted
Your ideas on this?
Did you solve it? I meet similar problem.
Hello @Accioy In my case, it was just the memory issue on nano.After solving memory issue, I didnt get that error.
Hey @srikar242. How did you resolve the memory issue? Did you use a larger SD card than the 4GB one?
@srikar242 how many fps did you get? Trying to see what I should aim for on mine
+1 @srikar242 could you explain how you resolved the memory issue please?
I'm doing exactly the same as you (using YOLOv4 in RT on a Jetson Nano), and having exactly the same problem.
I'm running headless, so the system is only using ~400MB, and I increased my swapfile to 16GB. I've also tried reducing the maximum working space and maximum batch sizes set at conversion (to 2GB and 1, respectively).
But, I still get the same too many resources requested for launch
error...
So, I figured out from this post that the error relates to CUDA’s maximum number of threads per block being too large.
And, according to this response from Nvidia, config.block_count, which appears in the Tensorflow script tensorflow/tensorflow/core/kernels/resize_bilinear_op_gpu.cu.cc needs to be adjusted down (e.g. to 512).
But I don't understand how to set it... Anybody have any ideas?
@bhaktatejas922 I got around 5 to 6 fps.
@arsenal-2004 It was something related to CUDA's number of threads per block issue. I followed some response on nvidia page to fix that. But I don't remember exactly how I did that as it was some months back and I switched to other topic.
I found a solution. The issue can be fixed by adding the following lines at the top of the detect script (before TensorFlow is imported):
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '1'
Hi @pauljerem, I tried your solution and it works!
However, I realized that this method ends up not using the GPU of the Nano. I verified this by tf.test.is_gpu_available()
and it returned False
(without adding this solution, it returned True
). I then tried testing this on another repo which didn't have this issue, and found that by adding this solution, the FPS slowed by about 3x because it was not using the GPU.
I'm hoping there's another solution that can both fix this issue and also allow the GPU to be used...
I found a solution. The issue can be fixed by adding the following lines at the top of the detect script (before TensorFlow is imported):
import os os.environ['CUDA_VISIBLE_DEVICES'] = '1'
Hi @pauljerem, I tried your solution and it works!
However, I realized that this method ends up not using the GPU of the Nano. I verified this by
tf.test.is_gpu_available()
and it returnedFalse
(without adding this solution, it returnedTrue
). I then tried testing this on another repo which didn't have this issue, and found that by adding this solution, the FPS slowed by about 3x because it was not using the GPU.I'm hoping there's another solution that can both fix this issue and also allow the GPU to be used...
I found a solution. The issue can be fixed by adding the following lines at the top of the detect script (before TensorFlow is imported):
import os os.environ['CUDA_VISIBLE_DEVICES'] = '1'
Same hear, but changing 1 to 0 did the trick for me, so maybe run
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '1'
So, I figured out from this post that the error relates to CUDA’s maximum number of threads per block being too large.
And, according to this response from Nvidia, config.block_count, which appears in the Tensorflow script tensorflow/tensorflow/core/kernels/resize_bilinear_op_gpu.cu.cc needs to be adjusted down (e.g. to 512).
But I don't understand how to set it... Anybody have any ideas?
@pauljerem were you able to find a way to reduce the value down to 512?