driving-in-the-matrix
driving-in-the-matrix copied to clipboard
Questions on learning rate schedule
Hello,
I have some problems in reproducing the result of this paper. Could you clarify the learning rate schedule for this paper? Thanks a lot!
In the paper, it mentioned, "Training began with a learning rate of 1e-3 and decreased by a factor of 10 after every 10k iterations until a minimum learning rate of 1e-8 was achieved."
My questions are:
-
The learning rate schedule in mxnet is defined in epochs not iterations. At least the file incubator-mxnet/example/rcnn/train_end2end.py doesn't accept iterations auguments.
-
Image flipping for data augmentation is on by default in mxnet, which means 1 epoch of 10k images contains 20k iterations. Is this definition of iteration match the one used in paper? Or it should be off.
-
Starting from 1e-3, every 10k iteration, learning rate decreased by 10 until 1e-8. That means most later samples in 200k data set will be used when the learning rate is very low (1e-8). Would that be a problem that these samples are more of less ignored in trainning? Due to very low learning rate.
If possible, could you also provide the file of training parameters setting or other code change of mxnet rcnn? That would make the last bit clear when reproducing results of this paper.
PS, I didn't use the docker environment, due to some gcc issues. But I think it's not the problem, since all changes in that docker are already merged into mainline.
Thank you very much!
-Liang