face.evoLVe icon indicating copy to clipboard operation
face.evoLVe copied to clipboard

Face alignment speed up and GPU usage

Open Wenlong0913 opened this issue 5 years ago • 12 comments

To whom it may concern,

This repo provided really amazing tools. Thanks for the great work. I tried face alignment, extract features by using this lib. I found the face alignment may cost 1.3s to process an image. After reading the code, I realized the mtcnn is not running on GPU. A a little bit changes were made, e.g., torch.FloatTensor => torch.cuda.FloatTensor, Pnet() =>Pnet().cuda(), etc.

This increased the face alignment speed per image from 1.3 to 0.8s. It works, however, the result does not make me satisfied. Is there a way to make the face detection/alignment run faster?

There is another thing make me confused. The GPU usage is very low, 1%~2%. Please see the attachments.

screen shot 2019-02-26 at 12 22 32

I'm not sure if this is due to I didn't configured the GPU properly or it is just one of the advantages of this library. The installed CUDA version is 9.2, Cudnn version is 7.4. Graphic card is RTX 2070. It reports an error after I run the python code. Can anyone tell me how to fix it? screen shot 2019-02-26 at 12 23 37

Again, many thanks for the great work!

Wenlong0913 avatar Feb 26 '19 04:02 Wenlong0913

@Wenlong0913 hello,I also want to improve the speed ,but there is poor promotion after using cuda. Have you find better method to accelerate?If so,can you please tell me?Thank you very much.

jolinlinlin avatar May 16 '19 09:05 jolinlinlin

@Wenlong0913 hello,I also want to improve the speed ,but there is poor promotion after using cuda. Have you find better method to accelerate?If so,can you please tell me?Thank you very much.

Nope. Perhaps you can try this implementation. It's faster. https://github.com/Seanlinx/mtcnn

Wenlong0913 avatar May 20 '19 08:05 Wenlong0913

@Wenlong0913 I am facing the same problem. Could you give some hints on how much it is faster than this one? And do you think the key point of low gpu usage is mtcnn itself or in-efficient implementation? Also, in your provided mtcnn, there is no prediction of landmarks. How do you utilize it to align face?

flyingmrwang avatar Jan 06 '20 08:01 flyingmrwang

Check out this fork: https://github.com/innerlee/face.evoLVe.PyTorch

Speed Comparison

original

In [1]: from PIL import Image 
   ...: from detector import detect_faces                                                                

In [2]: img = Image.open('../disp/Fig1.png').convert('RGB')                                                             

In [3]: %time detect_faces(img)                                                                                         
CPU times: user 2.85 s, sys: 172 ms, total: 3.02 s
Wall time: 610 ms

the fork

In [1]: from PIL import Image                                                                                           

In [2]: from evolveface import detect_faces, show_results                                                               

In [3]: img = Image.open('disp/Fig1.png').convert('RGB')

In [4]: %time detect_faces(img)                                                                                         
CPU times: user 255 ms, sys: 6.05 ms, total: 261 ms
Wall time: 42.3 ms

innerlee avatar Mar 18 '20 07:03 innerlee

@innerlee so it's all about pillow-simd?

xxxpsyduck avatar May 05 '20 09:05 xxxpsyduck

There are lots of code optimization also

innerlee avatar May 05 '20 11:05 innerlee

@innerlee Does it affect model performance?

xxxpsyduck avatar May 06 '20 01:05 xxxpsyduck

Purely speed changes. The bottleneck is not model inference.

innerlee avatar May 06 '20 01:05 innerlee

@innerlee I'm checking it out. Great work

xxxpsyduck avatar May 06 '20 01:05 xxxpsyduck

@innerlee Hello, can u share what's the estimated time for training this repo using full dataset of Celeb-1M? I tried using 4 GPU but my estimated time is too long like 200 days for a single batch. I cannot believe it. After ten hours of training using only 1/3 of data, it's still at the first epoch with batch 1920/411750. Can u share ur training status? Thx

YINDAIYING avatar Jun 03 '20 02:06 YINDAIYING

I use the provided weights for inference. Haven't tried training :shrug:

innerlee avatar Jun 03 '20 02:06 innerlee

image I also have this issue, I'm running the dataset on Tesla K80

PatrickPrakash avatar Nov 30 '20 06:11 PatrickPrakash