nanoGPT
nanoGPT copied to clipboard
Does the order of weight decay paramerters matter?
In this line of the configure_optimizers
method, the list of parameters are sorted. Just wondering does the order of params in a group matter?