question about the use of ExponentialMovingAverage
here is my code:
from torch_ema import ExponentialMovingAverage
model = ...
optimizer = ...
scheduler = ...
ema_model = ExponentialMovingAverage(parameters=pg, decay=0.9999)
for epoch in range(args.epochs):# train
print('epoch:',epoch,'Current learning rate:', optimizer.param_groups[0]['lr'])
train_loss, train_MAE, tb_writer = train_one_epoch(model=model,
optimizer=optimizer,
data_loader=train_loader,
device=device,
epoch=epoch,
scheduler=scheduler,
csv_filename=args.csv_filename,
tb_writer=tb_writer)
scheduler.step()
ema_model.update()
# validate
with ema_model.average_parameters():
val_loss, val_MAE = evaluate(model=model,data_loader=val_loader,device=device,epoch=epoch)
As shown in the code, I will execute the evaluate function on the validation set after each round of training. I found that the validation results are exactly the same when I set different decay values, why is that? The evaluate function is as follows: @torch.no_grad() def evaluate(model, data_loader, device, epoch): softceloss_function = SoftCrossEntropy() model.eval() data_loader = tqdm(data_loader) for step, data in enumerate(data_loader): images, names, labels = data pred = model(images.to(device)) softlabel = softlabel_function(labels) # a function to convert labels to softlabel loss = softceloss_function(pred, softlabel.to(device)) val_loss,val_MAE = ... # calculate loss and MAE
Hi sevennotmouse,
ema_model = ExponentialMovingAverage(parameters=pg, decay=0.9999)
Is pg changing in your example? Where does it come from? If there is a chance that pg does not change, maybe that would explain the behaviour you are seeing.
Thanks for your reply, let me add a clarification to the code. pg is the parameters to be trained in the model (I freeze some of the parameters of the model during training). Here is the detailed codes:
from torch_ema import ExponentialMovingAverage
model = ...
for name, para in model.named_parameters():
if "blocks" in name or "head" in name:
para.requires_grad_(True)
else:
para.requires_grad_(False)
pg = [p for p in model.parameters() if p.requires_grad]
optimizer = optim.SGD(pg, lr=0.01, momentum=0.9, weight_decay=5e-5)
scheduler = lr_scheduler.CosineAnnealingLR(optimizer, T_max=20, eta_min=0)
ema_model = ExponentialMovingAverage(parameters=pg, decay=0.9999)
best_MAE=10
save_path=...
for epoch in range(args.epochs):# train
print('epoch:',epoch,'Current learning rate:', optimizer.param_groups[0]['lr'])
train_loss, train_MAE, tb_writer = train_one_epoch(model=model,
optimizer=optimizer,
data_loader=train_loader,
device=device,
epoch=epoch,
scheduler=scheduler,
csv_filename=args.csv_filename,
tb_writer=tb_writer)
scheduler.step()
ema_model.update()
# validate
with ema_model.average_parameters():
val_loss, val_MAE = evaluate(model=model,data_loader=val_loader,device=device,epoch=epoch)
if val_MAE < best_MAE:
best_MAE=val_MAE
torch.save(ema_model.state_dict(), save_path)
The evaluate function is as follows:
@torch.no_grad()
def evaluate(model, data_loader, device, epoch):
softceloss_function = SoftCrossEntropy()
model.eval()
data_loader = tqdm(data_loader)
for step, data in enumerate(data_loader):
images, names, labels = data
pred = model(images.to(device))
softlabel = softlabel_function(labels) # a function to convert labels to softlabel
loss = softceloss_function(pred, softlabel.to(device))
val_loss,val_MAE = ... # calculate loss and MAE
return val_loss, val_MAE
At the end of each epoch during training, I execute the evaluate function on the validation set. If val_MAE < best_MAE, I want to save the model's checkpoint. After 20 epochs of training, I will select the best model with the best performance on the validation set and test it on the test set.
The results are as follows:
- If I don't use the package of pytorch_ema, or evaluate without this code
with ema_model.average_parameters():, the val_MAE of epoch 1,2,3 are 6.185, 5.779 and 5.529, respectively. - If I use the code provided above,
ema_model = ExponentialMovingAverage(parameters=pg, decay=0.9999), where decay is set to 0.9999, the val_MAE of epoch 1,2,3 are 6.269, 5.878 and 5.548, respectively. This demonstates the effectiveness of the ema mode. - When I try to set decay to other values such as 0.999(or any other value),
ema_model = ExponentialMovingAverage(parameters=pg, decay=0.999)and restart training, I found that the validation results are exactly the same, the val_MAE of epoch 1,2,3 are 6.269, 5.878 and 5.548 respectively.
To summarize, I have two questions:
- I wonder why the evaluate results are exactly the same when setting different decay values for training.
- Since
ema_model.state_dict()is different frommodel.state_dict(), how to save the checkpoint of ema_model and apply it on test set?
Looking forward to your reply!