CUDA-DNN-MNIST Training only works with 10_6 learning rate which is pretty high for mnist MLP, I tried using 0.01 but it doesn't converge at all, can you tell me how are you calculating the backward pass?

Training only works with 10_6 learning rate which is pretty high for mnist MLP, I tried using 0.01 but it doesn't converge at all, can you tell me how are you calculating the backward pass?

Open maomran opened this issue 6 years ago • 1 comments

Jul 27 '18 02:07 maomran

Hi @maomran, Thanks for cheking out my work! :)

I’ve also found such an issue while I was implementing this example. Sadly, I haven’t figure out why it happens... My goal for this project was to focus mainly on the CUDA aspect and then benchmark my code on different GPUs. This’s why I left such a small learning rate which was fine for me to pass my assignment successfully :) Sure, I would like to get back to this and debug it better but I won’t find enough time for it in the next few weeks :(

Getting back to your question - each one of the layer has its own backward() method. Here you can find an example for Dense layer: https://github.com/jpowie01/CUDA-DNN-MNIST/blob/master/src/layers/dense.cpp#L61

This method takes gradients from the ”upper” layer and return its own gradients. All of the flow is controlled by the Model which computes initial loss and pass it though all layers. Here you can find a Sequential model: https://github.com/jpowie01/CUDA-DNN-MNIST/blob/master/src/models/sequential.cu#L26

What’s more, Model knows which optimizer to use. In current implementation you can find SGD optimizer: https://github.com/jpowie01/CUDA-DNN-MNIST/blob/master/src/optimizers/sgd.cpp

I hope this will help you if you would like to debug it on your own. Don’t hesitate to ask other questions! :)

Jul 27 '18 22:07 jpowie01

CUDA-DNN-MNIST CUDA-DNN-MNIST copied to clipboard

Training only works with 10_6 learning rate which is pretty high for mnist MLP, I tried using 0.01 but it doesn't converge at all, can you tell me how are you calculating the backward pass?

CUDA-DNN-MNIST
CUDA-DNN-MNIST copied to clipboard