Emotion-FAN icon indicating copy to clipboard operation
Emotion-FAN copied to clipboard

Pre-trained model provides random result

Open iPsych opened this issue 5 years ago • 19 comments

I downloaded AFEW and extracted the frames like below.

/Data/Val/Angry/000149120/out-001.jpg out-002.jpg.... out-069.jpg

But the CUDA_VISIBLE_DEVICE=0 Python3 Demo_AFEW_Attention.py -e shows dramatically different result in each run. The AFEW file should be preprocessed other way or any version-mismatch is suspected?

iPsych avatar Feb 08 '20 23:02 iPsych

If you use the same model and parameters in the same run, it shouldn't like this. If your parameter of the model is random at each run. it results will dramatically different.

Open-Debin avatar Feb 16 '20 08:02 Open-Debin

pretrained provides 46% on afew whereas the paper reports 51. what could be the scope of improvements?

rshivansh avatar Feb 21 '20 22:02 rshivansh

The pretrained weights are lacking the pred_fc1 and pred_fc2 weights. The weights include a single fc layer that is ignored when loaded into the model. It also has a shape of 8 rather than 7 (FER+ includes 'contempt'). So I adjusted the model, slammed the fc weights into pred_fc1, adjusted the size from 7 to 8, and to my great surprise, it seems to work very well.

wildermuthn avatar Feb 21 '20 23:02 wildermuthn

The pretrained weight file has FER+ in it, which clued me into the problem. It was obviously trained on a different model architecture.

wildermuthn avatar Feb 21 '20 23:02 wildermuthn

I didn't run the eval on AFEW, but rather on faces in a custom dataset. If you're using the pretrained weights, I believe you'll need to retrain the model while possibly freezing all but the last two fc layers.

wildermuthn avatar Feb 21 '20 23:02 wildermuthn

thanks @wildermuthn . I will try this and get back in case I have more queries.

rshivansh avatar Feb 21 '20 23:02 rshivansh

Thanks, @wildermuthn. Would you let me know how you modified the code for testing the new frames?

iPsych avatar Feb 22 '20 01:02 iPsych

@iPsych

https://github.com/Open-Debin/Emotion-FAN/blob/master/Code/load_materials.py#L50-L55

to

    for key in pretrained_state_dict:
        if (key == 'module.fc.weight'):
            model_state_dict['pred_fc1.weight'] = pretrained_state_dict[key]
        elif (key == 'module.fc.bias'):
            model_state_dict['pred_fc1.bias'] = pretrained_state_dict[key]
        else:
            model_state_dict[key.replace('module.', '')] = pretrained_state_dict[key]

And

https://github.com/Open-Debin/Emotion-FAN/blob/master/Code/Model.py#L119

to

        self.pred_fc1 = nn.Linear(512, 8)

You'll need to ensure you run the demo script with --at_type 0

wildermuthn avatar Feb 25 '20 17:02 wildermuthn

Thanks @wildermuthn,

How did you prepared the image data? I am using face frames extracted using dlib to 224x224 .png files. Did you used same approach?

To get the individual result, I modified the /Code/util.py L8 _, pred = output.topk(maxk, 1, True, True) to score, pred = output.topk(maxk, 1, True, True) and added below lines in L11-13.

print(score)
print(pred)
print(correct)

When run with --at_type 0 -e, The result looks like below. The score and pred means the eval result for each videos, right? I am very confused with this 'randomly generative' result.

tensor([[0.3808], [0.3926], [0.4905], [0.3123], [0.4512], [0.3670], [0.3544], [0.2804], [0.5823], [0.4857], [0.3774], [0.3549], [0.4674], [0.4570], [0.4397], [0.5016], [0.4968], [0.4791], [0.7178], [0.2777], [0.5665], [0.3887], [0.4222], [0.3521]]) tensor([[6, 6, 6, 6, 6, 6, 6, 2, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6]]) tensor([[False, False, False, False, False, False, True, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False]])

tensor([[0.3453], [0.3778], [0.4147], [0.3899], [0.3512], [0.3242], [0.3353], [0.3923], [0.4174], [0.3654], [0.3732], [0.3552], [0.4328], [0.3729], [0.4283], [0.3892], [0.2489], [0.2897], [0.3648], [0.3452], [0.3545], [0.2920], [0.4728], [0.3690]]) tensor([[4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 4, 4, 4, 5, 4, 4, 4, 4]]) tensor([[False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, True, True, False, True, False, False, False]])

iPsych avatar Mar 01 '20 14:03 iPsych

Yeah, I did the same with dlib, except that you do need to increase the scale of the face-extract by 25%, according to the paper.

For inference, I had to translate the integer result with the following map:

 kmap = {0: 'neutral',1: 'happiness',2: 'surprise',3: 'sadness',4: 'anger',5: 'disgust',6: 'fear', 7: 'contempt'}

It is possible that this mapping is incorrect, but my testing on individual frames seems to validate the mapping. It corresponds to the FER+ dataset

During inference, the code seems to run the model twice, just with different parameters:

https://github.com/Open-Debin/Emotion-FAN/blob/master/Demo_AFEW_Attention.py#L181 https://github.com/Open-Debin/Emotion-FAN/blob/master/Demo_AFEW_Attention.py#L213

I didn't change either of those. It looks like you're referencing code that measures precision, which I removed entirely.

wildermuthn avatar Mar 02 '20 14:03 wildermuthn

@wildermuthn, Thanks, can you explain a little more about expanding dlib bounding box to 125% and extract frames?

iPsych avatar Mar 03 '20 03:03 iPsych

I recommend you 'http://dlib.net/' and parameter "padding=0.25 " of the code dlib.get_face_chips(img, faces, size=112, padding=0.25) means expanding dlib bounding box to 125%. @iPsych

Open-Debin avatar Mar 06 '20 08:03 Open-Debin

@Open-Debin , Thanks. get_face_chips function works perfectly : ) I am testing with some AFEW faces scoped to 224 pixels described in your paper. @Open-Debin @wildermuthn, I tested with several properly cropped clips. When I test three stimuli of two Angers (same frames) and One Surprise, The results is below. The tensor value is consistent, so I think the model worked somehow. but the labels are all zeros. Is there anything I have to modify or edit?

(1st trial)
[3.9368]
[3.9368]
[5.8051]

[0,0,0]
(2st trial)
[3.9757]
[3.9757]
[5.8079]

[0,0,0]

iPsych avatar Mar 07 '20 08:03 iPsych

@wildermuthn If I want to print the estimated category for input images, which variable I have to print? I think, the variables I printed in util.py was wrong, right? (i.e. input the frames of two video clips -> get [0], [5])

iPsych avatar Mar 07 '20 09:03 iPsych

After returning to this with fresh eyes, I think the problem is simple: the pre-trained weights provided aren't for the full model, but only the ResNet portion. From the paper:

By default, for feature embedding, we use the ResNet18 which is pre-trained on MSCeleb-1M [21] face recognition dataset and FER Plus expression dataset [22].

The weights provided are for the pretrained ResNet18, not for Emotion-FAN. Which to be completely fair, both the README and the file name itself indicate. Perhaps the README could be more clear that this pretrained ResNet model isn't sufficient to run the full model.

wildermuthn avatar Apr 21 '20 18:04 wildermuthn

@wildermuthn @iPsych @rshivansh Merry Christmas! I recently update the Emotion-FAN, new features include data process, environment install, CK+ code, and Baseline code. Also, you can find the old version directory of Emotion-FAN in the README.md. I hope my new updates can help you greatly. Please see the Emotion-FAN for more details.

Open-Debin avatar Dec 27 '20 01:12 Open-Debin

Thank you for sharing updates on Emotion-FAN.

I am running fan_afew_manifest.py code on AFEW faces generated using the same code. But the model shows training accuracy of 41% and validation accuracy of 7.38% (constant) after 180 epochs. I have not modified any parts of the code and followed all the steps. Can you please help in re-producing the results shared in the paper?

I tried by changing the learning rate too. But, the validation accuracy did not change.

Rasipuram avatar Jan 12 '21 05:01 Rasipuram

Thank you for sharing and updating about Emotion-FAN. I am also running fan_afew_manifest.py on the AFEW surface generated with the same code. But the model shows that the training accuracy after 100 cycles is 56%, and the verification accuracy is 9.37% (always unchanged). I did not modify any part of the code, but followed all the steps. Can you help reproduce the results shared in the paper? I also tried by changing the learning rate. However, the verification accuracy has not changed.

gongweijun avatar Apr 04 '21 12:04 gongweijun

If you evaluate using the -e option, check the this part of the code.

https://github.com/Open-Debin/Emotion-FAN/blob/874e871999a2002cd5dd9dffff2c4400c2e1805b/fan_afew_traintest.py#L35

_parameterDir = './pretrain_model/Resnet18_FER+_pytorch.pth.tar' to _parameterDir = './model/self_relation-attention_x_x.xxxx.pth.tar` x is the model accuracy

You are probably loading and evaluating the default Resnet18 model rather than the trained model.

VaulTroll avatar Nov 14 '22 06:11 VaulTroll