MiniCPM
MiniCPM copied to clipboard
WSD scheduler, Decay part question
Hi! Thank you for your work and for sharing the technical report with us. I have a questions. There is the constant lr after the decay phase as shown in the figure for each of the experiments.
However, the definition shows that the decay and last phase should be exponential.
Is it caused by the "cutoff steps T" which is in the equation for exponential decay? Thank you very much! I realy like the ideas behind!