emonet Explanation of landmark's heatmap output

First of all, congratulation for your great work and thanks a lot for sharing it!

I was using EmoNet to extract some face embeddings. After inspecting the code and the output of the model, I would like to ask you more information or how to interpret the out["heatmap"] matrix that the model outputs. I saw that the shape of this tensor is (68, 64, 64). As you extracted 68 facial landmarks, my intuition is that it is kind of an attention matrix over the landmarks or, in other words, which of that landmarks were more relevant when predicting the emotional class. But, why 64x64?? Well, maybe I am wrong.

Thanks in advance,

David.

Jul 14 '23 16:07 david-gimeno

@david-gimeno Hi David. Stuck with the same problem, any updates in your investigation?

Sep 19 '23 12:09 Developer1881

@Developer1881 According to my intuition, at some moment of the model forward the cropped face is embedded in a 64x64 latent representation, and then a heatmap is predicted for each one of the 68 facial landmarks, being the objective to concentrate the 'heat' in the position where the landmark should be. In other words, the model is learning to identify the position of each landmark via heatmaps over the face image. BUT, I am not sure.

Sep 20 '23 07:09 david-gimeno

@david-gimeno as I tried, and looks pretty normal, is looks on 64x64 matrics as a probability that exact of 68 points in a 64 to 64 pixels. so then I'm extrapolating to 256x256

Sep 20 '23 11:09 Developer1881