Self-restrained-Triplet-Loss
Self-restrained-Triplet-Loss copied to clipboard
About learning rate scheduler
In the section 4.5 of the paper:
All models are trained using an SGD optimizer with an initial learning rate of 1e−1 and batch size of 512. The learning rate is divided by 10 at 30k, 60k, 90k training iterations.
Since this paper is about losses, have there been any experiments on learning rate schedulers? In my experiment, I am using SRT loss. The loss keeps dropping with learning rate of 1e-1. Any suggestions on when is best to divide lr by 10?
Thank you!