awd-lstm-lm Switching criteria with non-monotone interval has difference in paper and code?

Switching criteria with non-monotone interval has difference in paper and code?

Open yashu-seth opened this issue 6 years ago • 3 comments

Hello Everyone,

I went through the paper Regularizing and Optimizing LSTM Language Models. In the algorithm (NT-ASGD) discussed in the paper, for the switching to take place, current validation loss should be greater than the last n intervals, where n is the non-monotone interval.

While in code the implementation suggests that current validation loss should be greater that all but the last n intervals.

if args.optimizer == 'sgd' and 't0' not in optimizer.param_groups[0] and
 (len(best_val_loss)>args.nonmono and val_loss > min(best_val_loss[:-args.nonmono])):
                print('Switching to ASGD')
                optimizer = torch.optim.ASGD(model.parameters(), lr=args.lr, t0=0, lambd=0., weight_decay=args.wdecay)

I think the code condition should be min(best_val_loss[-args.nonmono:])) instead of min(best_val_loss[:-args.nonmono])) .

Please correct me if am missing something.

Thanks.

Sep 17 '18 08:09 yashu-seth

I spotted the same problem. Is it possible to have any clarification about it? Thanks

Dec 03 '18 14:12 christian-5-28

Hi! Paper has a typo, instead of taking the last n intervals, they are masked and we compare against the remaining set. Check #13 for confirmation. Hope this helps.

Jan 31 '19 15:01 angeliand

@angeliand Thanks a lot!!

Feb 01 '19 20:02 yashu-seth

awd-lstm-lm awd-lstm-lm copied to clipboard

Switching criteria with non-monotone interval has difference in paper and code?

awd-lstm-lm
awd-lstm-lm copied to clipboard