FaceFormer Training with own dataset gives poor results

I am currently using my own prepared data set, 48 videos, each video contains different people. Among them, there are 40 training data, 3 validation data, and 5 test data. But my current result is that the mouth opening of the human face is very small, and the mouth barely moves.

Oct 30 '24 11:10 Echo-jyt

I've also encountered the problem. I suggest you check if the data are properly normalized and try adding layernorms before calculating attention scores.

Jan 08 '25 03:01 nuo1wang

I've also encountered the problem. I suggest you check if the data are properly normalized and try adding layernorms before calculating attention scores.

May I ask how did you normalize it, I normalized the vertex values to [0,1] using max-min. Did you end up getting good results?

Jan 17 '25 02:01 Echo-jyt

May I ask how did you normalize it, I normalized the vertex values to [0,1] using max-min. Did you end up getting good results?

Besides raw data normalization, you could try adding nn.layernorm in the original faceformer network before the transformer decoder layer, or simply passing norm_first=True to faceformer.decoder_layer during initiliaztion. I haven't tested on faceformer, but I fixed a very similar problem by this technique recently.

Jan 17 '25 03:01 nuo1wang

Besides raw data normalization, you could try adding nn.layernorm in the original faceformer network before the transformer decoder layer, or simply passing norm_first=True to faceformer.decoder_layer during initiliaztion. I haven't tested on faceformer, but I fixed a very similar problem by this technique recently.

Also, try different normalization techniques when dealing with raw data, e.g. z-score normalization.

Jan 17 '25 03:01 nuo1wang

Besides raw data normalization, you could try adding nn.layernorm in the original faceformer network before the transformer decoder layer, or simply passing norm_first=True to faceformer.decoder_layer during initiliaztion. I haven't tested on faceformer, but I fixed a very similar problem by this technique recently.

Also, try different normalization techniques when dealing with raw data, e.g. z-score normalization.

Thank you very much for your suggestion. I tried adding layernorm in the original faceformer network before the transformer decoder layer, but found that it had no effect. I will try z-score normalization later.

Jan 18 '25 02:01 Echo-jyt

Besides raw data normalization, you could try adding nn.layernorm in the original faceformer network before the transformer decoder layer, or simply passing norm_first=True to faceformer.decoder_layer during initiliaztion. I haven't tested on faceformer, but I fixed a very similar problem by this technique recently.

Also, try different normalization techniques when dealing with raw data, e.g. z-score normalization.

Thank you very much for your suggestion. I tried adding layernorm in the original faceformer network before the transformer decoder layer, but found that it had no effect. I will try z-score normalization later.

Hello, has this problem been resolved?

Nov 03 '25 13:11 cainiaoshidai