models
models copied to clipboard
Low GPU and CPU Usage while Inference / realtime detection
System information
- Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 16.04
- TensorFlow installed from (source or binary): source
- TensorFlow version (use command below): 1.4 with GPU
- Bazel version (if compiling from source): newest
- CUDA/cuDNN version: CUDA 9 / cuDNN 7
- GPU model and memory: Laptop: GeForce GTX 1050 4GB Jetson Tx2: Tegra 8GB
- Exact command to reproduce: clone my repo https://github.com/GustavZ/realtime_object_detection and run object_detection.py
Describe the problem
I am using the SSD Mobilenet for realtime inference with a Webcam as Input using OpenCV and i get following Performance: Laptop: ~25 fps at ~40% GPU and ~25% CPU Usage Jetson: ~5 fps at ~5-10% GPU and 10-40% CPU Usgae
Any hints why the Object Detection API is so slow on Inference. Training may be easy and fast ok, but inference / really using the models for realtime object detection is very slow and does not use full GPU. (For comparison YOLO with darknet runs at 90-100% GPU Usage with 3x higher fps)
Here is a screenshot what nvidia-smi and top give me while inferencing on the laptop
@jch1 @tombstone is the performance at expected levels?
would also be nice if someone could tell me how to properly call optimize_for_inference.py on the pre-trained ssd_mobilenet_v1_coco frozen Model. I was choosing image_tensor as Input Node and detection_boxes,detection_scores,num_detections,detection_classes as Output Nodes. The script compiled. But using the optimized graph failed. See this Question for more details: https://stackoverflow.com/questions/48212068/error-using-model-after-using-optimize-for-inference-py-on-frozen-graph
This would certainly increase my inference performance :) !
I have a similar issue. Trying to run a Mask RCNN model on a openCV webcam feed, but only 10% of the GPU is being utilized. Any tips on how to increase GPU utilization?
I also have a similar issue.
In, Tensorflow 1.5, very low GPU util and run slower than CPU.
However, in Tensorflow 1.4.
The GPU util is slightly higher than 1.5, which makes FPS is same as running on CPU.
This is my code https://gist.github.com/rocking5566/a284bebf5f39640d6eae6f744f74c2d2
Similar issue on GTX1050, GPU usage is around 10~15%.
When I run SSD detector continuously in a loop (with no other processes or additional delays), GPU-Util is around 40-42% and FPS is around 20.
However, when I run SSD detector with some delay between each call (around 100-200ms, in real case, I have multiple threads accessing the detect function, hence the small delay), GPU-Util drops down to 15% and FPS is around just 10.
Please suggest on how to increase GPU usage.
@heethesh Same problem here on 1050 Ti. 9-10% GPU usage. What is happening? 🤷♂️
i have faced the same problems. how can i solve it??? any help appreciated please
I have the same problems. While using pure CPU, about 15% CPU usage and low FPS.
how could the response time be improved if the model is hosted at k8s cluster and its accessible through request.post() . There are post processing done in my senario , but just the model response is collected , still the ssd_inception_v2 model takes 2-3 seconds of time.
any updates on the issue? I face the same problem. GPU utilization is 4% - 9%
any updates ? same problem here, utilization is between 1 and 3%
Same problem. Any fixes are appreciated.
Same problem here
same problem here, any solution?