PyRecognizer PyTorch/Tensorflow

Hello there,
I recently stumbled upon this repository and was interested in trying out your code. However, using single-threaded sklearn doesn't seem to be efficient to me, compared to using GPU-optimized PyTorch or TF.
Do you have any plans of moving to those frameworks, or would you accept a pullrequest implementing these?
Regards, Luke

Nov 24 '19 23:11 ClashLuke

Hi Clash,

Now i'm trying to understand which type of neural network suits better for recognize the 68 points extract from the face. So the work that you find here is only test/study purpose, for me and for everyone that need a basecode.

I'm currently changing the KNN in order to use a MLP classifier, that is obviosly more effifcient (in terms of precision) for this purpouse.

Of course, pull request are welcome! I've played very little with PyTorch, and i think that Tensorflow will be a more preferable choiche.

Nov 25 '19 08:11 alessiosavi

I've made some tests. During the predict phase, the most time consuming process is the face encoding.

As you can see, encode two face cost ~3s on my hardware (GeForce 940MX). It's because the jitter parameter used during the training/tuning phase have to be equal when make prediction, and i've choose 300 in order to increase the type of distortion made on the photos before training/predict.

Are you talking about tuning/training phase or predict?

Be sure to pull from master, i've migrated to MLP classifier that is more precise during prediction.

Nov 25 '19 10:11 alessiosavi

First of all, thank you very much for taking your time to reply to this issue regarding training-optimization.

Second, you pointed out that the most time is consumed when encoding faces. This process could mostly be skipped, by using a couple keras features, such as the ImageDataGenerator.

Why does the jitter have to be equal during training and prediction? Isn't it normally used as a regularization technique, and therefore should be left out during inference, or am I thinking about something else here?

Lastly, I'd love to know if you'd be fine with a (backwards-compatible) switch to a CNN, so that we could compare the performance of Inception-v4 with an MLP.

Nov 27 '19 21:11 ClashLuke

Hi Sir,

Thank you for the interest in the project! You was completely right!

The problems related to the jitter parameter, was caused from the KNN that was not able to recognize the faces if trained using an high number of distortion. So far as that they does not correspond quite strictly (during the train/prediction phase), seems that the network is not very precise (i think that this have to be investigated in the archiecture of the KNN, but is out of scope). With the MLP architecture (that have increase by an huge factor the confidence during the prediction), this little strange things is no more relevant, and we can use a different number of jitter during the training and predict phase.

Of course, the jitter parameter cause lot of time spent in image distortion (from documentation, jitter=300 -> 300x the time used), so use a different approach for create distortion will be a great performance improvement. We have to understand the necessary parameters for ImageDataGenerator in order to preserve the quality of prediction (jitter make an average of the augmented data) and increase the speed of faces encodings.

From the architectural POV, we can play as much as we want with the code. So we can try lot of different type of network, using the same dataset for compare the result.

I expect that (with an higher amount of data), the CNN/RNN perform better. I suspect, instead, that with very few photos (and the majority of the ones of this dataset are lesser than 10), the MLP will perform slightly better.

During the change of the NN basecode (Classifer.py), it's important to maintain the possibility of recognize multiple faces in the same photo.

Before migrate to tensorflow, i think that are some work from my side in order to clean the code and standardize return function.

Nov 27 '19 23:11 alessiosavi

Since the ImageDataGenerator has a lot of parameters it might be better to switch to the new system instead. Unfortunately I got lost in another repository trying to figure out what the jitter paramter does. Could you explain it real quick?
I also have to agree that MLPs might perform better than CNNs on tiny datasets. Luckily the ImageDataGenerator can alleviate this issue quite a bit.

Jan 12 '20 14:01 ClashLuke

Hi @ClashLuke,

The num_jitter is related to the number of times to re-sample the face when calculating encoding. If num_jitters>1 then each face will be randomly jittered slightly num_jitters times, each run through the 128D projection, and the average used as the face descriptor.

After some test, I've realized that cv2 is better to "find" faces in photos, when the image have low quality or the person in the photo have a "not centred" face angle.

I think that the first step is to migrate the face recognition from dlib (face_recognition use dlib internally) to the cv2.dnn.readNetFromCaffe. This will traduce to an increase of the quality related to face detection. In context like CCVT camera or low resolution/quality of the photo, we can be sure to have a quite optimal face detection tool. Than we can move to generate some augmented data that is helpful for the train/tune process. I think that i can start to work to the cv2 migration in the next month. I've lost the jupyter-notebook where i start to develop the poc of the migration

Jan 12 '20 15:01 alessiosavi

I dont think I understand. You sample the jitter paramter randomly from a uniform distribution u=[-P;P], where P is a parameter you set somewhere, correct?
Doesn't that mean that, we have an irwin-hall distribution, implying that we now have a normal distribution with zero mean and σ=((P*300)/12)^0.5*P=5P? Why jitter so many times?

Another thing I don't quite understand is where opencv and dlib come from. I assumed there was a MLP involved in this process?
Lastly, do you know why it's better at figuring out what the jittereted faced contain? Are the labels jittered as well?

Jan 13 '20 21:01 ClashLuke

I can finally say that I know what you're doing when training the model. We can definitely keep the initial pipeline, even though it would be nice to jitter while training. Should I give it a try to rewrite the Classifier in a new Tensorflow branch/fork?

I'd have to rewrite the hyperparameter search though. While I'm at it I'd also change the architecture to a densenet, as they are insanely powerful for mlps. Would that be an issue for you?

Mar 10 '20 06:03 ClashLuke

Hi @ClashLuke, thank you for the interest and sorry for the late response. I'm very busy these days and i can only contribute in the weekend.

Of course, you can rewrite every part of the code that you are confident (:

Another thing I don't quite understand is where opencv and dlib come from. I assumed there was a MLP involved in this process?

dlib is used (from the face_recognition high level API) in order to extract the point related to the face. It use a custom version of the library but now we can use the one present in the master branch of the project. The MLP is delegated to "link" the face encodings to the label

Lastly, do you know why it's better at figuring out what the jittereted faced contain? Are the labels jittered as well? From my understanding, no. The jitter perform data augmentation on the image, so the label is the same.

I've created a gitter channel in order to discuss the future change/roadmap of the project. https://gitter.im/PyRecognizer/PyRecognizer-Development

Thank you another time for the interest for the project.

Mar 11 '20 11:03 alessiosavi

Are the hyperparameter search and the architecture search important? If not, I'd postpone them for now. The basic model already exists. What's next are the training loop and regularization.

Apr 22 '20 10:04 ClashLuke

Hi Clash!

Thank you for the effort of the analysis! I'm here for explanation if you need some tips on the code.

Of course, we can tune the hyperparameters in the next phase :D

Apr 22 '20 12:04 alessiosavi

Pretty sure I've got a testable state with model tuning now here. I had to remove balanced accuracy and precision for now, as I wasn't keen on calculating accuracy in buckets.
Now, how would I go about testing this unit?

Apr 22 '20 15:04 ClashLuke

Any news?

May 11 '20 22:05 ClashLuke

Hi Clash, I'm going to rewrite the "backen engine" from scratch using dlib and tensorflow. I'm going to update the repo in the next month.

I'm testing the neural network and it have ~97% accuracy on validation dataset! I'm changing completely the architecture of the code. I think that the repository will split in two different part: Python NeuralNetwork, a webserver that run on localhost delegated to:

Load the image
Recognize face bound
Perform data augmentation using ImageDataGenerator
Encode face using shape_predictor_68_face_landmarks.dat and dlib_face_recognition_resnet_model_v1.dat instead of face_recognition wrapper in order to get more accuracy
Use a Tensorflow Dense network

Go webservice:

New go frontend delegated to talk with the python daemon in order to expose the predict functionality.

In first instance the train will be delegated to run without HTTP interaction, so scripts will be released in order to train "offline" the network

Apr 29 '21 12:04 alessiosavi