word2vec-pytorch About the loss

About the loss

Open hsx479 opened this issue 5 years ago • 1 comments

Hello！Thanks for your code! Have you observed the loss? I had loaded the code and executed it. However, the loss didn't seems to be convergent. It descends rapidly at first, but with the increase of the epoch, it looks like a cosine function and the amplitude increases too.

Jun 14 '19 05:06 hsx479

However, the loss didn't seems to be convergent. It descends rapidly at first, but with the increase of the epoch, it looks like a cosine function and the amplitude increases too.

The fluctuation in the loss is probably caused by corresponding variation in the learning rate. It's using a cosine annealing learning rate schedule, in which the learning rate is lowered from its starting point to zero over the course of every epoch, and then reset at the start of the next one. This has been found to work in certain contexts, because it allows convergence to local minima within an epoch, but allows the optimizer to escape local minima by periodically resetting the learning rate.

Personally I've had more success with just straight SGD with the learning rate being exponentially lowered every epoch.

Aug 23 '19 05:08 ljjb

word2vec-pytorch word2vec-pytorch copied to clipboard

About the loss

word2vec-pytorch
word2vec-pytorch copied to clipboard