avatarify-python icon indicating copy to clipboard operation
avatarify-python copied to clipboard

One eye won't blink ?

Open AlonDan opened this issue 4 years ago • 5 comments

Describe the bug

I'm not sure if it's a bug or a feature-upcoming. I noticed that after position correct when blinking with two eyes it works great, very nice and smooth. But when blinking only one of the eyes, it won't make the avatar image blink on that one eye and stretch a tiny bit on the area of the eye like it's "trying" to blink.

To Reproduce

  1. Blink with 2 eyes = Your Avatar image will blink.
  2. Blink with Left or Right eye only = See the weird tiny stretch.

Info (please complete the following information):

  • OS: Windows 10 - 64bit
  • GPU model: NVIDIA GeForce GTX 980
  • Any other relevant information:

Suggestion Please read this part ONLY in case this is not a bug but a feature-related. I must mention: I'm not a programmer but I'm an animator who also works with post production, so maybe it's possible to translate how I would "solve" this:

Since the avatar can produce a very native BLINK with 2 Eyes already, Maybe somehow in the programming side, there is a way to mask out the none-blinked eye somehow, but it's only a guess so if it's not how things work (probably) I hope you can add a full control Blinking with each eye separate instead of only 2 eyes.

AlonDan avatar Sep 18 '20 04:09 AlonDan

First of all: avatarify just makes it easy to use the DeepFake Model First Order Motion Model. So performance/image quality issues belong actually there.

Anyways, I will answer your question: Actually the architecture used to generate the new image is heavily based on neural networks, and in fact it is nearly impossible (for anybody) to control this in anyway. In fact the architecture consists of looking for some keypoints in the current frame and looks where they go, then some processing is applied to get dense motion and in the last step you DIRECTLY feed this into a neural network. You could try to change/mask out some things in the motion (obviously this would require some coding) but you can't really change how the neural network predicts the frame. So it's more like a bug (but keep in mind this is real time and state of the art), and actually not fixable without any further research. But how knows, maybe the next paper will predict this right...

mintmaker avatar Oct 15 '20 17:10 mintmaker

The face animation is based on detecting key point on both source (avatar) and driving images. The key points detector is trained in an unsupervised manner, i.e. it learns which areas on a face are moving a lot and puts points on these areas. These sparse key points are then processed to get dense (pixel-wise) motion which morphs the source image's pose into the driving image's pose.

Learning is in other words loss function optimisation, which measures the difference between predicted and ground truth images. If the detector predicts key points which do not capture the movement then the result will be far from the ground truth and the detector will be penalised to predict more meaningful key points.

So the key to the answer here is "moving parts on a face". This network is trained on VoxCeleb dataset of celebrities giving interviews. People don't usually wink on these videos, so the neural network doesn't bother neither. It's simply not penalised for not being able to do what it was not asked for.

The best way to fix this is to train it with winking faces. Any handcrafted algorithms would lead to less natural results: this becomes apparent if you think about how human face change when he/she winks -- other face parts also move when you close one eye. Simply put, the neural network has to learn how entire face change when one winks.

alievk avatar Oct 19 '20 07:10 alievk

Yes, exactly but the morphing is also done by another trained neural network that probably makes the results way better. It's main task is to fill in the occlusions, which occur when the eye is closed. If it never has seen only one eye closed this may lead to poor performance. So that one should be trained again, too.

mintmaker avatar Oct 19 '20 07:10 mintmaker

The decoder has rather limited receptive field and doesn't differentiate between the eyes. In other words, it doesn't "see" two eyes. But these are details.

alievk avatar Oct 20 '20 11:10 alievk

Okay, I thought it's a rather deep FCNN and thus one eye would be influenced by the other (similar to patch adversarial attacks influencing almost the whole output of an FCNN).

mintmaker avatar Oct 20 '20 12:10 mintmaker