face.evoLVe icon indicating copy to clipboard operation
face.evoLVe copied to clipboard

Why the training prec@1/5 is always 0.00000?

Open sstzal opened this issue 5 years ago • 16 comments

When I run the train.py using the MS1M as training dataset, the Training Prec@1 and Training Prec@5 are always 0.000000. Could anyone tell me why? image

sstzal avatar Sep 26 '19 05:09 sstzal

your training loss is too large, continue training

Samonsix avatar Oct 09 '19 01:10 Samonsix

did u have solved ?? @sstzal

NHDat2 avatar Apr 06 '20 09:04 NHDat2

I have the same problem with you even though my loss around 17 after epoch 20th @@

NHDat2 avatar Apr 06 '20 09:04 NHDat2

did u have solved ?? @sstzal

Yes, i have solved the problem. Just wait, and the training loss will fall.

sstzal avatar Apr 06 '20 09:04 sstzal

in order to your pre # 0. how many for epoch and loss ?? @sstzal

NHDat2 avatar Apr 06 '20 09:04 NHDat2

in order to your pre # 0. how many for epoch and loss ?? @sstzal

After about two epoches, the Training Prec@1 and Training Prec@5 will grow slowly. However, I have noticed that you have the problem even after epoch 20th, I think you maybe have some mistakes in your code.

sstzal avatar Apr 06 '20 09:04 sstzal

although my Prec@1,5 is always 0.0000 but Evaluation is 0.7xxxx @@... and Is 6k imgs (100MB) data train not enough ??. hence, my prec@1,5 is always 0.000xx although in 40th epoch @@

NHDat2 avatar Apr 06 '20 10:04 NHDat2

@NHDat2 if you train with arcface, you should change the hyperparameter m and s.

ReverseSystem001 avatar Apr 09 '20 08:04 ReverseSystem001

I also have this question, and I have already trained 21 epochs. However, the performance on validation set become better and better ... I use the same dataset which the readme.md mentioned. 屏幕快照 2020-04-10 上午10 13 11

zhangxilun avatar Apr 10 '20 02:04 zhangxilun

arcface loss? 892379162 邮箱:[email protected] 签名由 网易邮箱大师 定制 On 04/10/2020 10:15, zhangxilun wrote: I also have this question, and I have already trained 21 epochs. However, the performance on validation set become better and better ... I use the same dataset which the readme.md mentioned. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

ReverseSystem001 avatar Apr 10 '20 02:04 ReverseSystem001

Yes, I use arcface loss. Another key point I need to mention is that I change the embedding size from recommended 512 to 256, because I'd like to use the algorithm in my own mobile device. Maybe the 256-d tensor doesn't have enough feature to represent the differences between different persons?

zhangxilun avatar Apr 13 '20 02:04 zhangxilun

the hyperparameters m and s should be changed. you can set m=45 s=0.3,train one epoch. if it's ok,the top acc will not always zero. ------------------ 原始邮件 ------------------ 发件人: "zhangxilun"[email protected] 发送时间: 2020年4月13日(星期一) 上午10:56 收件人: "ZhaoJ9014/face.evoLVe.PyTorch"[email protected]; 抄送: "reverseSystem"[email protected];"Comment"[email protected]; 主题: Re: [ZhaoJ9014/face.evoLVe.PyTorch] Why the training prec@1/5 isalways 0.00000? (#93)

Yes, I use arcface loss. Another key point I need to mention is that I change the embedding size from recommended 512 to 256, because I'd like to use the algorithm in my own mobile device. Maybe the 256-d tensor doesn't have enough feature to represent the differences between different persons?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

ReverseSystem001 avatar Apr 13 '20 03:04 ReverseSystem001

I had the same issue. I have researched the issue and found that with cross entropy loss the precision do improved. Changing of focal.py from:

class FocalLoss(nn.Module):
    def __init__(self, gamma = 2, eps = 1e-7):
        super(FocalLoss, self).__init__()
        self.gamma = gamma
        self.eps = eps
        self.ce = nn.CrossEntropyLoss()

    def forward(self, input, target):
        logp = self.ce(input, target)
        p = torch.exp(-logp)
        loss = (1 - p) ** self.gamma * logp
        return loss.mean()

to:

class FocalLoss(nn.Module):
    def __init__(self, gamma = 2, eps = 1e-7):
        super(FocalLoss, self).__init__()
        self.gamma = gamma
        self.eps = eps
        self.ce = nn.CrossEntropyLoss(reduction='none')

    def forward(self, input, target):
        logp = self.ce(input, target)
        p = torch.exp(-logp)
        loss = (1 - p) ** self.gamma * logp
        return loss.mean()

have resolved (at least, for me) the issue, and now the precision is improved also with FocallLoss

ItamarKanter avatar Oct 14 '20 13:10 ItamarKanter

@ReverseSystem001 Can you show me where the hyperparameters m and s?

fungtion avatar Apr 16 '21 06:04 fungtion

I have the same issue and don't have enough GPU. I trained on google colab and had to reduce the number of images from the dataset (train on casia-maxpy-clean dataset with 401 image folder).

Top1 and Top5 doesn't change after 29 epoches.

has anyone a solution for this problem?

image

quangtn266 avatar Jun 03 '21 02:06 quangtn266