models Low GPU and CPU Usage while Inference / realtime detection

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 16.04
TensorFlow installed from (source or binary): source
TensorFlow version (use command below): 1.4 with GPU
Bazel version (if compiling from source): newest
CUDA/cuDNN version: CUDA 9 / cuDNN 7
GPU model and memory: Laptop: GeForce GTX 1050 4GB Jetson Tx2: Tegra 8GB
Exact command to reproduce: clone my repo https://github.com/GustavZ/realtime_object_detection and run object_detection.py

Describe the problem

I am using the SSD Mobilenet for realtime inference with a Webcam as Input using OpenCV and i get following Performance: Laptop: ~25 fps at ~40% GPU and ~25% CPU Usage Jetson: ~5 fps at ~5-10% GPU and 10-40% CPU Usgae

Any hints why the Object Detection API is so slow on Inference. Training may be easy and fast ok, but inference / really using the models for realtime object detection is very slow and does not use full GPU. (For comparison YOLO with darknet runs at 90-100% GPU Usage with 3x higher fps)

Here is a screenshot what nvidia-smi and top give me while inferencing on the laptop screenshot from 2018-01-10 15-40-12

Jan 10 '18 14:01 gustavz

@jch1 @tombstone is the performance at expected levels?

Jan 11 '18 19:01 cy89

would also be nice if someone could tell me how to properly call optimize_for_inference.py on the pre-trained ssd_mobilenet_v1_coco frozen Model. I was choosing image_tensor as Input Node and detection_boxes,detection_scores,num_detections,detection_classes as Output Nodes. The script compiled. But using the optimized graph failed. See this Question for more details: https://stackoverflow.com/questions/48212068/error-using-model-after-using-optimize-for-inference-py-on-frozen-graph

This would certainly increase my inference performance :) !

Jan 11 '18 19:01 gustavz

I have a similar issue. Trying to run a Mask RCNN model on a openCV webcam feed, but only 10% of the GPU is being utilized. Any tips on how to increase GPU utilization?

Jan 16 '18 00:01 ghost

I also have a similar issue. In, Tensorflow 1.5, very low GPU util and run slower than CPU. screenshot from 2018-03-22 10 46 38

However, in Tensorflow 1.4. The GPU util is slightly higher than 1.5, which makes FPS is same as running on CPU. screenshot from 2018-03-22 10 49 58

This is my code https://gist.github.com/rocking5566/a284bebf5f39640d6eae6f744f74c2d2

Mar 22 '18 02:03 rocking5566

Similar issue on GTX1050, GPU usage is around 10~15%.

When I run SSD detector continuously in a loop (with no other processes or additional delays), GPU-Util is around 40-42% and FPS is around 20.

However, when I run SSD detector with some delay between each call (around 100-200ms, in real case, I have multiple threads accessing the detect function, hence the small delay), GPU-Util drops down to 15% and FPS is around just 10.

Please suggest on how to increase GPU usage.

Apr 19 '18 11:04 heethesh

@heethesh Same problem here on 1050 Ti. 9-10% GPU usage. What is happening? 🤷‍♂️

Apr 20 '18 16:04 CT83

i have faced the same problems. how can i solve it??? any help appreciated please

Jan 31 '19 15:01 getchhan

I have the same problems. While using pure CPU, about 15% CPU usage and low FPS.

Mar 25 '19 06:03 lf-openthos

how could the response time be improved if the model is hosted at k8s cluster and its accessible through request.post() . There are post processing done in my senario , but just the model response is collected , still the ssd_inception_v2 model takes 2-3 seconds of time.