vision
vision copied to clipboard
Added Cosine Annealing Warmup Restarts With Decay optimiser
This is an optimiser that is a variant of the Cosine Annealing Restart scheduler or the OneCycleLR optimiser. The added optimiser can decrease over time the maximum learning rate it warms up to and can increase the length of the decay cycle over time. If we set T_warmup=0, gamma=1.0
we recover the base CosineAnnealingWarmRestarts.
This was based on the observation that CREStereo generally reacts better to a constantly changing learning rate as opposed to the default scheduler proposed in the paper. Bellow is an example of a schedule generated with a value of gamma < 1.0
and a value of T_warmup > 0
This was based on the observation that CREStereo generally reacts better to a constantly changing learning rate as opposed to the default scheduler proposed in the paper.
To avoid adding unnecessary complexity on the references, it's worth starting from the paper's proposal first. If you can't reproduce the results and you need an extra boost, we could consider adding new optimizers. wdyt?
@datumbox There are multiple people stating on the original repo that they have trouble reproducing the performance claimed by the authors with the training code provided by them.
This is something I have noticed as well during my training runs. It can however be ameliorated with much simpler scheduling. We can skip this PR all-together, I do not have any strong opinions on it.
@TeodorPoncu Thanks for the references! It's great that you tried reproducing it already.
From what I understand the nan
value appears on the early epochs much before any restart. If that's the case, then the modified optimizer might have less effect on nans and it might be beneficial to do warmup at the beginning using one of the standard schedulers such as LinearLR
or ConstantLR
. If that doesn't work out then we should consider merging this PR.
In general it's not a big problem adding primitives in references, as long as they are necessary. For tasks such as Classification, we are more forgiving because some techniques are popular across multiple models and because we constantly do work on them. For Depth Perception that is still prototype, it would be great if we could keep things simple as long as we can reproduce the accuracies and we don't have a good reason to complicate things. So if you really need it, we can definitely add it. If you manage to work around it, that's a plus. :)
@datumbox I have already managed to work around both the NaN and some of the performance drop by simply adjusting the point at which the learning rate starts decaying, and by replacing the linear decay with a cosine. Therefor, the scheduler is just a minor nice-to-have / bonus.
The benefit of this schedule is in fine-tuning scenarios, solely for convergence time and a very minor performance boost.
Hi @TeodorPoncu!
Thank you for your pull request.
We require contributors to sign our Contributor License Agreement, and yours needs attention.
You currently have a record in our system, but the CLA is no longer valid, and will need to be resubmitted.
Process
In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.
Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed
. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.
If you have received this in error or have any questions, please contact us at [email protected]. Thanks!
@TeodorPoncu shall we close the PR given you managed to train the models and deploy the weights?
Yes, this could be closed.