darknet icon indicating copy to clipboard operation
darknet copied to clipboard

Object detection is very slow

Open isabek opened this issue 7 years ago • 92 comments

I have installed Cuda & OpenCV for object detection as written in docs.

And tried to detect objects on the video file and got nearly FPS: 2.8.

./darknet detector demo cfg/coco.data cfg/yolo.cfg yolo.weights video-file.mp4

Then I tried to use cudNN, but result was FPS: 2.0.

What should I do to get higher FPS?

Environment Ubuntu 16.04 CUDA Version 8.0.61 OpenCV 3.2 GPU: GeForce 840M (2GB) Driver: NVIDIA 375.66

isabek avatar Jul 13 '17 22:07 isabek

@Isabek You should get about ~7 FPS on GeForce 840M

What is the resolution in your video file?

AlexeyAB avatar Jul 13 '17 22:07 AlexeyAB

@AlexeyAB 1920x1080. Is it big? What is the eligible resolution for darknet?

isabek avatar Jul 13 '17 22:07 isabek

Try to decrease width=416 and height=416 in the yolo.cfg file: https://github.com/pjreddie/darknet/blob/master/cfg/yolo.cfg#L8

Resolution 1920x1080 of video file is normanl for Yolo, but also try to use lower resolution video file.

AlexeyAB avatar Jul 13 '17 22:07 AlexeyAB

I have decreased width and height in the config file. Now FPS is ~5.4 which is good, but it is not enough. Btw, I tried to use lower video but result was the same.

isabek avatar Jul 13 '17 22:07 isabek

@AlexeyAB thank you.

isabek avatar Jul 13 '17 22:07 isabek

hi Isabek, I met with the same problem as you described and wonder if you've solved it?

DennisWangCW avatar Jul 17 '17 06:07 DennisWangCW

Hi @DennisWangCW, If you want to reach very high FPS as shown on darknet documentation you need this kind of computer.

Btw, you can train you own model. Because YOLO tries to classify a lot of objects.

P.S. I couldn't solve it. As @AlexeyAB said, I can reach only 7.5 FPS with my GPU which is not enough for me.

isabek avatar Jul 17 '17 10:07 isabek

@Isabek Hi,

You can use Tiny-Yolo instead of Yolo, so you can get about ~15 FPS on GeForce 840M: ./darknet detector demo cfg/coco.data cfg/tiny-yolo.cfg tiny-yolo.weights video-file.mp4

  • tiny-yolo.cfg: https://raw.githubusercontent.com/pjreddie/darknet/master/cfg/tiny-yolo.cfg
  • tiny-yolo.weight: https://pjreddie.com/media/files/tiny-yolo.weights

Also you can decrease width=288 and height=288 in the tiny-yolo.cfg so you can get about ~30 FPS on GeForce 840M.

But each such step worsens the detection accuracy.

AlexeyAB avatar Jul 17 '17 10:07 AlexeyAB

Thank you @AlexeyAB.

I decreased width and height to 288 in tiny-yolo.cfg and changed video file resolution to 960x540. Now FPS is ~25 which is cool. FYI @DennisWangCW

isabek avatar Jul 18 '17 10:07 isabek

@AlexeyAB how can I train my own model? Is it possible? I have my own image dataset.

isabek avatar Jul 18 '17 10:07 isabek

@Isabek Yes: https://github.com/AlexeyAB/darknet#how-to-train-to-detect-your-custom-objects

AlexeyAB avatar Jul 18 '17 10:07 AlexeyAB

Hi @AlexeyAB. I am bit confused about absolute_x, absolute_height. Could explain them? Thanks!

isabek avatar Jul 18 '17 14:07 isabek

@Isabek You have an image 1920x1080, and object-1 with center (100,200) and width=50,height=20 So for this object: absolute_x = 100, absolute_y = 200, absolute_width = 50, absolute_height = 20

According to <x> = <absolute_x> / <image_width> or <height> = <absolute_height> / <image_height> Into txt file you should write: 1 0.052 0.185 0.026 0.0185

Or just use this tool: https://github.com/AlexeyAB/Yolo_mark

AlexeyAB avatar Jul 18 '17 15:07 AlexeyAB

@AlexeyAB I have 6 classes. How many images should I choose for each class? I selected 100 images for each class and weights file for 500 iterations weighs 256M.

isabek avatar Jul 19 '17 13:07 isabek

@Isabek I still don't solve the problem, but whatever, thank you.

DennisWangCW avatar Jul 25 '17 09:07 DennisWangCW

@Isabek Usually 500 - 2000 images for each class (object) is enough. And should be trained (2000 X number_of_classes) iterations.

AlexeyAB avatar Jul 25 '17 14:07 AlexeyAB

@AlexeyAB thank you. I have started training 2 days ago. It is training my model still :)

isabek avatar Jul 26 '17 11:07 isabek

I played around with yolo.cfg and tiny-yolo.cfg (width=416 and height=416) config files, but performance lags around 4 fps reading from a video file and writing output to another video file(avi). What type of performance should developers expect using the Nvidia TX1 platform???

Also, when I run the web cam demo, I get about 10-12 FPS with tiny-yolo and a Sony Playstation Eye. Do I need to set the camera resolution somewhere?

nvidia@tegra-ubuntu:~/darknet$ ./darknet detector demo cfg/voc.data cfg/tiny-yolo-voc.cfg weights/tiny-yolo-voc.weights

FPS:11.2

Thx.

kaisark avatar Oct 15 '17 22:10 kaisark

I'm experiencing about same performance on TX2, with tiny yolo there is a lot of headroom on the tx2 to spawn more processes. I'm able to get ~30 fps with 4 simultaneous processes.

TheMikeyR avatar Oct 16 '17 09:10 TheMikeyR

@TheMikeyR Did you say 30fps on the TX2? Is that the capture rate or processing rate? What does your config/setup (hw/sf) look like for tiny yolo? Are you using Opencv 3? My understanding is that OpenCV is not very good at Video I/O (ffmpeg/gstreamer). What type of camera are you using?

kaisark avatar Oct 18 '17 20:10 kaisark

@kaisark I'm processing offline video Using sudo jetson_clocks.sh from home directory (should be installed with JetPack) and then I'm using nvpmodel -m 2 which turns all cores on (from 4 to 6 cores) with MAXP. I didn't modify the original tiny-yolo much, I have one class to predict so I've of course modified the filters and classes.
I've compiled with CUDNN=1 GPU=1 OPENCV=1 but then I've removed the "viewing" part of the demo function, so it doesn't display the results but only prints in the terminal.
I've uncommented this line https://github.com/pjreddie/darknet/blob/c7252703420159a9f3a1ec416b1b4326c4c95402/src/demo.c#L194 to prevent OpenCV from displaying the video which speeds it up (it still show the detected objects in the terminal)
Lastly I'm opening 4 terminals and running the same command in all of them ./darknet detector demo data/rgb.data cfg/tiny-yolo.cfg ~/data/create/detection_annotation/yolo/tiny/tiny-yolo_50000.weights ~/data/create/videos/summarized_right_10min.mp4 -i 0
I believe the FPS is for the entire processing rate, since it gets called one place in demo.c and is not displayed before it gets to the same place again, so I assume it is the entire pipeline.

The camera which have been used is the Zed camera and then only right view of the rgb stream.

TheMikeyR avatar Oct 19 '17 06:10 TheMikeyR

@Isabek Hi, did you trained ur own model with lower class of objects? Does it improve fps?

xhuvom avatar Oct 20 '17 13:10 xhuvom

@xhuvom I have trained my own model with my collected dataset. But result is the same. FPS is 3.7. You can watch result here https://www.youtube.com/watch?v=QopUtQobWJ0

isabek avatar Oct 23 '17 10:10 isabek

@AlexeyAB I would like to buy a new video card. And I am little bit confused. What is the difference between MSI, Asus, EVGA and Zotac?

I am planning to buy GTX 1070. How many frames per second can I reach with GTX 1070 on YOLO?

isabek avatar Nov 03 '17 12:11 isabek

@Isabek primarily the cooler and some of the cards are factory overclocked (run faster). Here is a list http://thepcenthusiast.com/geforce-gtx-1070-compared-asus-evga-zotac-msi-gigabyte/ where you can filter compared to clock etc. In the end it doesn't matter much, you can also overclock the card yourself and just go with the cheaper one. It's a silicon lottery, sometimes you get a chip which can overclock a lot and other times you can't achieve anything over stock speeds.

Can't help with FPS, depends on many things.

TheMikeyR avatar Nov 03 '17 13:11 TheMikeyR

@Isabek You can achive about ~0.01 FPS per 1 GFlops-SP using yolo-voc.cfg network 416x416 on my fork.

Look at Single precision for your GPU: https://en.wikipedia.org/wiki/List_of_Nvidia_graphics_processing_units#GeForce_10_series

  • so on GeForce GTX 1070 - 6462 GFlops-SP = ~64 FPS

As the TheMikeyR said correctly, GPU of different companies are distinguished by a small overclocking of GPU and a cooling system, and they can also differ in reliability.

AlexeyAB avatar Nov 03 '17 13:11 AlexeyAB

I am bit confused about Precision. How can I calculate model's precision? Inside function which calculates recall and IOU I want to calculate precision. I found an answer for my question. But I am not sure.

isabek avatar Nov 21 '17 13:11 isabek

@AlexeyAB I need recommendation for kind/brand/model of camera to use for training,detection and recognition of multiple faces at a time from a live stream (e.g a check-in counter). Please help

jTariq avatar Dec 05 '17 14:12 jTariq

@TheMikeyR Hey i am also working on the Jetson Tx2 and i get following Fps with sudo nvpmodel -m 0 sudo ./jetson_clocks.sh

Tiny-Yolo: 17.5 fps YoloV2: 2.7 Googles Object Detection APi with SSD_MobileNet: 4 fps

How much do you get and how could i speed up?

I documented my problem a little bit more in detail if you have a look here: https://devtalk.nvidia.com/default/topic/1027819/jetson-tx2/object-detection-performance-jetson-tx2-slower-than-expected/

Would be nice hearing from you!

gustavz avatar Dec 20 '17 09:12 gustavz

@GustavZ I also have the same performance with TX2. sudo nvpmodel -m 0 sudo ./jetson_clocks.sh

Tiny-Yolo: 17.5 fps YoloV2: 2.7

So, I'll drive to the tensorRT with jetPack 3.2 beta.

OseongKwon avatar Dec 28 '17 04:12 OseongKwon