TrainYourOwnYOLO Multiple streams in multiple GPUs and in multiple processes finally working

Hi, @AntonMu:

After many long nights, I have finally achieved true YOLO multiprocessing using one, two, many GPUs for one, two, many completely separate YOLO instances/processes at high speed.

Subdividing the GPU memory of one card, I can process two (and possibly more) video streams on one GPU in completely independent processes, each with their own, separate model, if necessary. On a 1080ti working on two streams of 1020x780 color video, I get a Yolo rate of 14-15fps per stream, or a total of around 30 frames per second run through Yolo.
I can run the two independent streams in two independent GPUs, both working for their independent process, one for each stream. This gives me 22-23 fps per stream on a 1080ti, and 11-12 fps on a 1060, both running in the same machine at the same time. With two 1080ti in the machine, I get 22-23 fps on both streams, or 44-46 fps total. The cameras I’m using can’t do better than 22-23fps anyway, it’s possible than it will go a few frames higher. Round trip through YOLO is 0.036 sec, which comes out to a theoretical max rate of 28fps, minus other overhead. A 2080ti should achieve more than 25fps.
Combinations of the above are very much possible. 4 streams at around 14-15 fps each should be doable (with a big A/C, each card sucks around 200 Watts….)
This has been achieved with minimal changes to yolo.py, mainly in the initialization of the YOLO class. The main routine is a major rewrite, because it is video-centric, and because I am working with config files, one each for each stream and process.

I would like to donate this to the cause. How would you want me to go about it? I will probably need a week or two for cleaning up the code anyway. Please let me know. twoyolo

Jun 18 '20 18:06 bertelschmitt

Hi @bertelschmitt ,

That is great to hear. I'm definitely happy to have you contribute your findings to this repo.

You should write it up and then make a pull request to this repo.

Thank you!

Jun 23 '20 06:06 AntonMu

Will do. I'm still testing. I managed to squeeze up to 13 independent Yolo processes at a time into one 1080 ti. I also am coming across some wrinkles, for instance the 2nd GPU not working at all in an older motherboard, despite being recognized by nvidia-smi. Works great in a new mobo. Also, antique GPUs (tested with GTX 670 and 760) seem to be out.

Will come back when matters have stabilized.

B

Jun 23 '20 06:06 bertelschmitt

Update: Core functionality (i.e. changes to yolo.py to accommodate multiple and fractional GPUs) appears to be solid (as much as I can claim that from testing on two machines.) Working on a sample app using the new interface to capture, classify, display and store multiple streams using multiple and/or fractional GPUs and Python multiprocessing.

Jul 12 '20 11:07 bertelschmitt

@AntonMu: Which branch do you want me to use as basis for the changes? Currently using a master from early 2020. I plan to annotate my changes with a tag like #BS071220, That OK, or do you want something else?

Jul 12 '20 12:07 bertelschmitt

@bertelschmitt
can you please share the code?

Sep 08 '20 07:09 shahzaibraza37

@shahzaibraza37 , I will;. Give me a few days for cleanup

Sep 08 '20 07:09 bertelschmitt

@shahzaibraza37 , I will;. Give me a few days for cleanup

pls do share.

Sep 27 '20 19:09 johnjhr

@johnjaiharjose : Soon. Am in the middle of cleanup. Here is an appetizer: 2020-09-28 06-20-22

Sep 28 '20 23:09 bertelschmitt

@johnjaiharjose : Soon. Am in the middle of cleanup. Here is an appetizer:

Woah, Truly Amazing.Waiting badly for your code.

I see you have 2 models for each camera view. Few Questions popped up in my mind. Hope you dont mind.

Are you using 4 gpu 2 Models?
Is it possible to run 2 models (different Camera view like ur case) on a single gpu after memory limiting. (Like 2gb each model, at same time, Inference only )
Explain your data pipeline in the cat application briefly please.

Sep 29 '20 15:09 johnjhr

My implementation will allow you to

run as many INDEPENDENT processes per GPU as you GPU and systems memory allow
run on as many GPUs you can fit into your machine
use from one model for all processes to as many separate models as you have processes

For me, a process takes a little less than 1 Gbyte of GPU memory. An 11 Gbyte 1080ti can accommodate up to 11 processes. (The example is running its 4 processes easily on one GPU, even a cheaper one)

Two 1080ti , 22 processes.

Each process now demands around 2.5 Gbyte of main memory (Tensorflow 2.X is a bit more memory greedy that 1.X.) 22 x 2.5 = 55Gbyte ...... You better have a the memory to go along with the GPUs. MD-arch

The pipeline for many processes is the same as for one:

On initialization, the main module reads and parses the config file, and turns it into as many config settings as there are processes. Main launches each process with its custom settings. Main launches a Master process that acts as a communication and management hub for all processes. Main then keeps checking that all processes are still alive, and otherwise does nothing. In each process, the flow is as follows:

Frame is captured from video source as specified (can be different for each process)
Frame is run through YOLO model for inference (model can be different for each process)
Result is displayed in window (different for each process)
Result can optionally (and automatically on object detection) be saved to individual video file.
Optionally, a file with meta data per frame can be saved (different for each process.) This file can later be used for very efficient training.

The project comes in two parts:

A modified yolo.py - This has already been running for months 24/7 on multiple GPUs and is stable.
MultiDetect.py , an application that makes use of the modified yolo.py - This is turning into a monster, it requires heavy inter-process communication, it uses a massive config file for endless customization, it uses tkinter for status and user interaction. It is 99% done, but there is always something. I hope to be finished in a few days. I believe MultiDetect.py is necessary to provide for a good "out-of-the-box experience" so to speak. I want the user be able to fire it up, and see something happening.

Here is an image of 18 separate processes running on two GPUs: catwide

Sep 29 '20 16:09 bertelschmitt

@bertelschmitt Thank you so much for your detailed and well explained answers! This clears so many things for me! 👍

Sep 29 '20 18:09 johnjhr

Daijoubu, as we say in Japan

Sep 29 '20 19:09 bertelschmitt

Downloading Pretrained Weights

| | # | 7533 Elapsed Time: 0:00:05 Downloaded Pretrained Weights in 6.8 seconds

Detecting Cat Faces by calling:

python /content/TrainYourOwnYOLO/3_Inference/Detector.py --input_path /content/TrainYourOwnYOLO/Data/Source_Images/Test_Images --classes /content/TrainYourOwnYOLO/Data/Model_Weights/data_classes.txt --output /content/TrainYourOwnYOLO/Data/Source_Images/Test_Image_Detection_Results --yolo_model /content/TrainYourOwnYOLO/Data/Model_Weights/trained_weights_final.h5 --box_file /content/TrainYourOwnYOLO/Data/Source_Images/Test_Image_Detection_Results/Detection_Results.csv --anchors /content/TrainYourOwnYOLO/2_Training/src/keras_yolo3/model_data/yolo_anchors.txt --file_types .jpg .jpeg .png

2021-03-01 15:16:10.514250: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1 Traceback (most recent call last): File "/content/TrainYourOwnYOLO/3_Inference/Detector.py", line 21, in from keras_yolo3.yolo import YOLO, detect_video, detect_webcam File "/content/TrainYourOwnYOLO/2_Training/src/keras_yolo3/yolo.py", line 18, in from keras.utils import multi_gpu_model ImportError: cannot import name 'multi_gpu_model' from 'keras.utils' (/usr/local/lib/python3.7/dist-packages/keras/utils/init.py) Detected Cat Faces in 2.1 seconds

this error comes often in colab itself how to solve this.

Mar 01 '21 15:03 sanjay-s-far

@bertelschmitt Bothering late, but curious to know whether you have hosted this challenging work anywhere to the public?

Apr 18 '24 14:04 kzos

No, sorry. Just using it for myself. Have switched to Coral TPUs lately. Much cheaper, and use way less power. Also, implemented in just a few lines of Python.

Apr 18 '24 15:04 bertelschmitt

TrainYourOwnYOLO TrainYourOwnYOLO copied to clipboard

Multiple streams in multiple GPUs and in multiple processes finally working

TrainYourOwnYOLO
TrainYourOwnYOLO copied to clipboard