Will Merrill
Will Merrill
My bad, completely missed this issue until now! I had a private repo of some preliminary experiments I was doing using the library, mostly focused on language modeling and grammar...
https://pytorch.org/docs/stable/notes/cuda.html
> However, the perplexity (and thus loss) seems to be converging to a local/global minimum. If the weights are converging to a local minimum, the gradient norm should also be...
Hi all - sorry I missed this thread. > Also, are you aware of the perils of having larger parameters? Should we settle for larger parameters? Or should we strive...
> Please check our loss curve. Would you happen to have any suggestions or solutions to this issue? @SuperBruceJia It's hard to say from just the loss and grad norm...