awd-lstm-lm
awd-lstm-lm copied to clipboard
Switching criteria with non-monotone interval has difference in paper and code?
Hello Everyone,
I went through the paper Regularizing and Optimizing LSTM Language Models. In the algorithm (NT-ASGD) discussed in the paper, for the switching to take place, current validation loss should be greater than the last n intervals, where n is the non-monotone interval.
While in code the implementation suggests that current validation loss should be greater that all but the last n intervals.
if args.optimizer == 'sgd' and 't0' not in optimizer.param_groups[0] and
(len(best_val_loss)>args.nonmono and val_loss > min(best_val_loss[:-args.nonmono])):
print('Switching to ASGD')
optimizer = torch.optim.ASGD(model.parameters(), lr=args.lr, t0=0, lambd=0., weight_decay=args.wdecay)
I think the code condition should be min(best_val_loss[-args.nonmono:]))
instead of min(best_val_loss[:-args.nonmono]))
.
Please correct me if am missing something.
Thanks.
I spotted the same problem. Is it possible to have any clarification about it? Thanks
Hi! Paper has a typo, instead of taking the last n intervals, they are masked and we compare against the remaining set. Check #13 for confirmation. Hope this helps.
@angeliand Thanks a lot!!