tensorflow-yolov4-tflite icon indicating copy to clipboard operation
tensorflow-yolov4-tflite copied to clipboard

Aborted Error while running detect.py with yolov4-tiny-trt model in Jetson Nano

Open srikar242 opened this issue 4 years ago • 12 comments

Hi @hunglc007 , I have converted the yolov4-tiny weights into tensorflow weights and then converted it into tensorrt model using your repo. Now when trying to run detect.py file with the compressed tensorrt model,on my system its working fine.But on Jetson Nano, the same file with same code is not working. Its getting aborted. I have converted the weights into tf on nano itself . Below is the error:

2020-09-28 19:38:17.401081: I tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc:1205] Loaded TensorRT version: 7.1.3
2020-09-28 19:38:17.445402: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libnvinfer_plugin.so.7
2020-09-28 19:41:16.717255: F tensorflow/core/kernels/resize_bilinear_op_gpu.cu.cc:493] Non-OK-status: GpuLaunchKernel(kernel, config.block_count, config.thread_per_block, 0, d.stream(), config.virtual_thread_count, images.data(), height_scale, width_scale, batch, in_height, in_width, channels, out_height, out_width, output.data()) status: Internal: too many resources requested for launch
Fatal Python error: Aborted

Thread 0x0000007f996b9010 (most recent call first):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/execute.py", line 60 in quick_execute
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 598 in call
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 1746 in _call_flat
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/saved_model/load.py", line 101 in _call_flat
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 1645 in _call_impl
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 1605 in __call__
  File "detect.py", line 66 in main
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 250 in _run_main
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 299 in run
  File "detect.py", line 90 in <module>
Aborted

Your ideas on this?

srikar242 avatar Sep 23 '20 15:09 srikar242

Did you solve it? I meet similar problem.

Accioy avatar Oct 28 '20 07:10 Accioy

Hello @Accioy In my case, it was just the memory issue on nano.After solving memory issue, I didnt get that error.

srikar242 avatar Oct 28 '20 17:10 srikar242

Hey @srikar242. How did you resolve the memory issue? Did you use a larger SD card than the 4GB one?

parthjdoshi avatar Dec 07 '20 11:12 parthjdoshi

@srikar242 how many fps did you get? Trying to see what I should aim for on mine

bhaktatejas922 avatar Dec 11 '20 21:12 bhaktatejas922

+1 @srikar242 could you explain how you resolved the memory issue please?

I'm doing exactly the same as you (using YOLOv4 in RT on a Jetson Nano), and having exactly the same problem.

I'm running headless, so the system is only using ~400MB, and I increased my swapfile to 16GB. I've also tried reducing the maximum working space and maximum batch sizes set at conversion (to 2GB and 1, respectively).

But, I still get the same too many resources requested for launcherror...

pauljerem avatar Dec 16 '20 12:12 pauljerem

So, I figured out from this post that the error relates to CUDA’s maximum number of threads per block being too large.

And, according to this response from Nvidia, config.block_count, which appears in the Tensorflow script tensorflow/tensorflow/core/kernels/resize_bilinear_op_gpu.cu.cc needs to be adjusted down (e.g. to 512).

But I don't understand how to set it... Anybody have any ideas?

pauljerem avatar Dec 18 '20 11:12 pauljerem

@bhaktatejas922 I got around 5 to 6 fps.

srikar242 avatar Dec 28 '20 12:12 srikar242

@arsenal-2004 It was something related to CUDA's number of threads per block issue. I followed some response on nvidia page to fix that. But I don't remember exactly how I did that as it was some months back and I switched to other topic.

srikar242 avatar Dec 28 '20 12:12 srikar242

I found a solution. The issue can be fixed by adding the following lines at the top of the detect script (before TensorFlow is imported):

import os
os.environ['CUDA_VISIBLE_DEVICES'] = '1'

pauljerem avatar Jan 08 '21 11:01 pauljerem

Hi @pauljerem, I tried your solution and it works!

However, I realized that this method ends up not using the GPU of the Nano. I verified this by tf.test.is_gpu_available() and it returned False (without adding this solution, it returned True). I then tried testing this on another repo which didn't have this issue, and found that by adding this solution, the FPS slowed by about 3x because it was not using the GPU.

I'm hoping there's another solution that can both fix this issue and also allow the GPU to be used...

I found a solution. The issue can be fixed by adding the following lines at the top of the detect script (before TensorFlow is imported):

import os
os.environ['CUDA_VISIBLE_DEVICES'] = '1'

leeping-ng avatar Feb 17 '21 11:02 leeping-ng

Hi @pauljerem, I tried your solution and it works!

However, I realized that this method ends up not using the GPU of the Nano. I verified this by tf.test.is_gpu_available() and it returned False (without adding this solution, it returned True). I then tried testing this on another repo which didn't have this issue, and found that by adding this solution, the FPS slowed by about 3x because it was not using the GPU.

I'm hoping there's another solution that can both fix this issue and also allow the GPU to be used...

I found a solution. The issue can be fixed by adding the following lines at the top of the detect script (before TensorFlow is imported):

import os
os.environ['CUDA_VISIBLE_DEVICES'] = '1'

Same hear, but changing 1 to 0 did the trick for me, so maybe run import os os.environ['CUDA_VISIBLE_DEVICES'] = '1'

Watashi26 avatar Oct 05 '21 11:10 Watashi26

So, I figured out from this post that the error relates to CUDA’s maximum number of threads per block being too large.

And, according to this response from Nvidia, config.block_count, which appears in the Tensorflow script tensorflow/tensorflow/core/kernels/resize_bilinear_op_gpu.cu.cc needs to be adjusted down (e.g. to 512).

But I don't understand how to set it... Anybody have any ideas?

@pauljerem were you able to find a way to reduce the value down to 512?

pedromarta avatar Apr 19 '23 00:04 pedromarta