InsightFace_Pytorch
InsightFace_Pytorch copied to clipboard
code seems do not support resume training from a saved weight?
I was training a model using CASIA-Webface and it stopped in some where of total epochs accidentally. So I've add some lines in Learner.py and tried to resume training but got failed. here is my resuming code:
def train(self, conf, epochs,resume=False,fixed_str=None):
self.model.train()
conf.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
self.model = nn.DataParallel(self.model)
self.head = nn.DataParallel(self.head)
self.model.cuda()
self.head.cuda()
start_epoch=0
if resume==True:
if not fixed_str:
raise ValueError('must input fixed_str parameter!')
self.load_state(conf,fixed_str)
self.step = int(fixed_str.split('_')[-2].split(':')[1])+1
start_epoch = self.step//len(self.loader)
self.step = start_epoch*len(self.loader)+1
print('loading model at epoch {} done!'.format(start_epoch))
print(self.optimizer)
running_loss = 0.
dc_loss = 0.
bceloss_func = nn.BCELoss()
for e in range(start_epoch,epochs):
print('epoch {} started'.format(e))
if e == self.milestones[0]:
self.schedule_lr()
if e == self.milestones[1]:
self.schedule_lr()
#nothing changed below
I've changed nothing below. The wired thing is that when I load whichever the weights of all of model, head and optimizer and then continue training, I got a very high CELoss. I tested it in a ipython notebook. When I random initialize a learner, I got CELoss around 45, but when I load a weights(which get a 93% acc on LFW) for the learner I got CELoss around 77. I think the problem lies in the logic of class Arcface in Learner.py but I am not sure. If anyone could help me figure out the issue?
Resuming training works well for me. I didnt change anything in train method and use Arcface head.
Resuming training works well for me. I didnt change anything in train method and use Arcface head.
@boomberung thx. Then maybe my problem lies in nn.DataParallel. I will try it later.
@qq184861643 When I change ArcFace to my own head I have same issue like you. Random initial loss is ~45, but when I resume training from weights loss start from ~50 (and lfw acc is 94%).
Hi Unrelated to your question maybe, but i wanted to perform face verification , and the current arcface architecture does not perform really well on my dataset, is it possible to fine tune the model, with my custom dataset?
Thanks in advance
@boomberung Hi! have you figured it out how to solve this? I've tried several methods but still can't fix it
@DecentMakeover if we can't solve the resuming issue I don't think fine-tuning is possible
@qq184861643 No, but I found that even with the curve loss function, the network is learning normally. And I think the problem is with this line "loss_board = running_loss / self.board_loss_every"
@qq184861643 @boomberung Have you solved the problem about resuming?
Hi Unrelated to your question maybe, but i wanted to perform face verification , and the current arcface architecture does not perform really well on my dataset, is it possible to fine tune the model, with my custom dataset?
Thanks in advance
Yes, it is possible. But, I can not get high accuracy when training on my custom dataset. Have you got any idea to solve it?