Default learning rate and other hyper params.

Open msultan opened this issue 8 years ago • 1 comments

Based upon some testing, I am starting to think that the default learning rate of 1e-4 is probably too low for our applications and might be better to bump it up to 5e-3 or even 1e-2. This is mostly based on empirical observations that the higher learning rates tend to get "similar" looking models even with differing architectures, batch sizes, and number of epochs. It also helps that we have the Adam optimizer which can attenuate the rate as training goes forward.

Dec 09 '17 17:12 msultan

Could also look at adaptive learning rate as a function of epoch

Dec 09 '17 22:12 brookehus