Will Merrill comments

Results 5 comments of


                                            Will Merrill

Demo?

My bad, completely missed this issue until now! I had a private repo of some preliminary experiments I was doing using the library, mostly focused on language modeling and grammar...

CUDA compatibility

https://pytorch.org/docs/stable/notes/cuda.html

why is the total_grad_norm increasing across training?

> However, the perplexity (and thus loss) seems to be converging to a local/global minimum. If the weights are converging to a local minimum, the gradient norm should also be...

why is the total_grad_norm increasing across training?

Hi all - sorry I missed this thread. > Also, are you aware of the perils of having larger parameters? Should we settle for larger parameters? Or should we strive...

why is the total_grad_norm increasing across training?

> Please check our loss curve. Would you happen to have any suggestions or solutions to this issue? @SuperBruceJia It's hard to say from just the loss and grad norm...