Self-restrained-Triplet-Loss icon indicating copy to clipboard operation
Self-restrained-Triplet-Loss copied to clipboard

About learning rate scheduler

Open deeperlearner opened this issue 6 months ago • 0 comments

In the section 4.5 of the paper:

All models are trained using an SGD optimizer with an initial learning rate of 1e−1 and batch size of 512. The learning rate is divided by 10 at 30k, 60k, 90k training iterations.

Since this paper is about losses, have there been any experiments on learning rate schedulers? In my experiment, I am using SRT loss. The loss keeps dropping with learning rate of 1e-1. Any suggestions on when is best to divide lr by 10?

Thank you!

deeperlearner avatar Aug 02 '24 10:08 deeperlearner