tf_trt_models icon indicating copy to clipboard operation
tf_trt_models copied to clipboard

trt.create_inference_graph step in detection.ipynb stuck for long time

Open roarjn opened this issue 4 years ago • 5 comments

Hi, In the detection.ipynb I have set the score_threshold=0.3 as recommended. The above cells run as expected however, the trt.create_inference_graph cell does not run. When i run this cell I see a Python process utilize 100% cpu on top command. Then this cpu utilization goes to 0 but the cell still keeps running. I have kept the cell running for >30mins. https://github.com/NVIDIA-AI-IOT/tf_trt_models/blob/master/examples/detection/detection.ipynb

Appreciate any help.

roarjn avatar Jun 01 '20 01:06 roarjn

I have a similar issue. I try to run detection.ipynb on Jetson Nano (jetpack 4.3, python 3.6, tensorflow 1.15) but when it reaches trt.create_inference_graph() it stucks for several minutes and the kernel restarts. Memory usage is 3.3/3.9GB and swap almost empty. Last terminal outputs:

2020-06-05 23:51:45.473972: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:633] Number of TensorRT candidate segments: 2 2020-06-05 23:51:45.688493: F tensorflow/core/util/device_name_utils.cc:92] Check failed: IsJobName(job) [I 23:55:25.776 NotebookApp] KernelRestarter: restarting kernel (1/5), keep random ports WARNING:root:kernel bc86b93e-4a68-4470-a522-7bdfd2c6f95a restarted

Appreciate any help.

dkatsios avatar Jun 05 '20 21:06 dkatsios

hello,have you ever solved this problem? I encounter same

evil-potato avatar Jun 26 '20 14:06 evil-potato

I have a similar issue. I try to run detection.ipynb on Jetson Nano (jetpack 4.3, python 3.6, tensorflow 1.15) but when it reaches trt.create_inference_graph() it stucks for several minutes and the kernel restarts. Memory usage is 3.3/3.9GB and swap almost empty. Last terminal outputs:

2020-06-05 23:51:45.473972: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:633] Number of TensorRT candidate segments: 2 2020-06-05 23:51:45.688493: F tensorflow/core/util/device_name_utils.cc:92] Check failed: IsJobName(job) [I 23:55:25.776 NotebookApp] KernelRestarter: restarting kernel (1/5), keep random ports WARNING:root:kernel bc86b93e-4a68-4470-a522-7bdfd2c6f95a restarted

Appreciate any help.

hello,have you ever solved this problem? I encounter same

evil-potato avatar Jun 26 '20 14:06 evil-potato

I have a similar issue. I try to run detection.ipynb on Jetson Nano (jetpack 4.3, python 3.6, tensorflow 1.15) but when it reaches trt.create_inference_graph() it stucks for several minutes and the kernel restarts. Memory usage is 3.3/3.9GB and swap almost empty. Last terminal outputs:

2020-06-05 23:51:45.473972: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:633] Number of TensorRT candidate segments: 2 2020-06-05 23:51:45.688493: F tensorflow/core/util/device_name_utils.cc:92] Check failed: IsJobName(job) [I 23:55:25.776 NotebookApp] KernelRestarter: restarting kernel (1/5), keep random ports WARNING:root:kernel bc86b93e-4a68-4470-a522-7bdfd2c6f95a restarted

Appreciate any help.

The Kernel gets restarted in my case too. I raised an issue with nvidia, but their solution didn't work for me.

My current settings are TF 1.15.5 TensorRT 8.0.0 Ubuntu 18.04

Guess a lot of people are facing this issue when trying to optimize the frozen graph using TensorRT.

Repository owners, please fix this bug.

sachinkmohan avatar Oct 11 '21 13:10 sachinkmohan

Here is the solution to this issue. @dkatsios @roarjn @evil-potato

Add one new parameter to this below code, i.e force_nms_cpu=False which is not present in this repository version of the code. Make sure you are also having the right TF and Jetpack version installed.

frozen_graph, input_names, output_names = build_detection_graph(
    config=config_path,
    checkpoint=checkpoint_path,
    force_nms_cpu=False,
    #score_threshold=0.3,
    batch_size=1
)

When I closely looked in the jupyter terminal, the error pointed to something like this. Tensorflow TensorRT: Could not load dynamic library 'libnvinfer.so.5' which led me to the below links. https://github.com/tensorflow/tensorflow/issues/34329 https://forums.developer.nvidia.com/t/tf-trt-error-on-jetson-nano/187611 https://forums.developer.nvidia.com/t/error-while-converting-object-detection-model-to-tensorrt/117127 https://github.com/tensorflow/tensorrt/issues/197

sachinkmohan avatar Nov 22 '21 16:11 sachinkmohan