Training with own dataset gives poor results
I am currently using my own prepared data set, 48 videos, each video contains different people. Among them, there are 40 training data, 3 validation data, and 5 test data. But my current result is that the mouth opening of the human face is very small, and the mouth barely moves.
I've also encountered the problem. I suggest you check if the data are properly normalized and try adding layernorms before calculating attention scores.
I've also encountered the problem. I suggest you check if the data are properly normalized and try adding layernorms before calculating attention scores.
May I ask how did you normalize it, I normalized the vertex values to [0,1] using max-min. Did you end up getting good results?
May I ask how did you normalize it, I normalized the vertex values to [0,1] using max-min. Did you end up getting good results?
Besides raw data normalization, you could try adding nn.layernorm in the original faceformer network before the transformer decoder layer, or simply passing norm_first=True to faceformer.decoder_layer during initiliaztion. I haven't tested on faceformer, but I fixed a very similar problem by this technique recently.
Besides raw data normalization, you could try adding nn.layernorm in the original faceformer network before the transformer decoder layer, or simply passing norm_first=True to faceformer.decoder_layer during initiliaztion. I haven't tested on faceformer, but I fixed a very similar problem by this technique recently.
Also, try different normalization techniques when dealing with raw data, e.g. z-score normalization.
Besides raw data normalization, you could try adding nn.layernorm in the original faceformer network before the transformer decoder layer, or simply passing norm_first=True to faceformer.decoder_layer during initiliaztion. I haven't tested on faceformer, but I fixed a very similar problem by this technique recently.
Also, try different normalization techniques when dealing with raw data, e.g. z-score normalization.
Thank you very much for your suggestion. I tried adding layernorm in the original faceformer network before the transformer decoder layer, but found that it had no effect. I will try z-score normalization later.
Besides raw data normalization, you could try adding nn.layernorm in the original faceformer network before the transformer decoder layer, or simply passing norm_first=True to faceformer.decoder_layer during initiliaztion. I haven't tested on faceformer, but I fixed a very similar problem by this technique recently.
Also, try different normalization techniques when dealing with raw data, e.g. z-score normalization.
Thank you very much for your suggestion. I tried adding layernorm in the original faceformer network before the transformer decoder layer, but found that it had no effect. I will try z-score normalization later.
Hello, has this problem been resolved?