alpr-unconstrained icon indicating copy to clipboard operation
alpr-unconstrained copied to clipboard

Inference of all three stages using GPU?

Open PhilipsKoshy opened this issue 5 years ago • 7 comments

When you use with video, we'd like the entire inference (vehicle detection + License Plate detection + OCR) to be as fast as possible, so that we can process as many frames as possible. When I have frame, I'd like to do all the three inferences in GPU, before I start with the next frame. Otherwise, we lose all the gain of GPU acceleration, by moving the image back and forth between CPU and GPU. Unfortunately, these stages are dealing with stored files. Any thoughts?

PhilipsKoshy avatar Jul 23 '19 07:07 PhilipsKoshy

Following. Hey have you worked on this or not so far. I am thinking about dong so but just can't get my head around how to start it and what changes do I have to make for this to work

fadi212 avatar Aug 20 '19 09:08 fadi212

For taking video input, I found the following good. https://github.com/sergiomsilva/alpr-unconstrained/issues/57#issuecomment-511706352

But, this repo expects images stored in files on disc, for every stage (VD, LPD, OCR). I guess that is the way original Darknet expects. So, I guess, we need to modify the Darknet code to accept image matrix in GPU FB rather than image file on Disc. Plus, we can also try more optimizations like pinned memory and DMA transfer to GPU FB etc. I'm not able to try these now. If you happen to try out, please share the details.

PhilipsKoshy avatar Aug 29 '19 02:08 PhilipsKoshy

Hi, PhilipsKoshy! My program runs slowly on both my CPU and GPU. It takes four seconds to process an image. My cpu is 6 cores and 6 processes, and the memory is 8GB. My GPU is 8GB of TeslaM10. How many frames per second can you process, and what computing environment are you in?

MingRongXi avatar May 05 '21 15:05 MingRongXi

Hi, PhilipsKoshy! My program runs slowly on both my CPU and GPU. It takes four seconds to process an image. My cpu is 6 cores and 6 processes, and the memory is 8GB. My GPU is 8GB of TeslaM10. How many frames per second can you process, and what computing environment are you in?

Did you ensure that the GPU acceleration is actually happening? Did you check with nvidia-smi or any similar utility? Is the YOLO built to make use of the GPU?

PhilipsKoshy avatar May 06 '21 05:05 PhilipsKoshy

Hi, PhilipsKoshy! My program runs slowly on both my CPU and GPU. It takes four seconds to process an image. My cpu is 6 cores and 6 processes, and the memory is 8GB. My GPU is 8GB of TeslaM10. How many frames per second can you process, and what computing environment are you in?

Did you ensure that the GPU acceleration is actually happening? Did you check with nvidia-smi or any similar utility? Is the YOLO built to make use of the GPU?

Yes. I complied the darknet with cuda and gpu, so the stage of vehicle-detection and ocr are fast. But the wpod-net is slow although I ran it with tensorflow-gpu and keras-gpu. And on my machine, the speed of tensorflow-gpu and tensorflow is basically the same, both slow.

MingRongXi avatar May 06 '21 05:05 MingRongXi

I worked on it a while back. So, from my memory... I avoided the saving of the image to file; instead I modified it to hand over the frame in the memory to the next stages. So eliminate the file I/O. If I am able to dig up my old work, I will post it here later.

PhilipsKoshy avatar May 06 '21 05:05 PhilipsKoshy

Oh, thank you very much! But on my machine, the biggest factor affecting speed is the process of wpod-net, not IO. Do you remember your FPS and computing environment?

MingRongXi avatar May 06 '21 06:05 MingRongXi