CUDA-DNN-MNIST
CUDA-DNN-MNIST copied to clipboard
Training only works with 10_6 learning rate which is pretty high for mnist MLP, I tried using 0.01 but it doesn't converge at all, can you tell me how are you calculating the backward pass?
Hi @maomran, Thanks for cheking out my work! :)
I’ve also found such an issue while I was implementing this example. Sadly, I haven’t figure out why it happens... My goal for this project was to focus mainly on the CUDA aspect and then benchmark my code on different GPUs. This’s why I left such a small learning rate which was fine for me to pass my assignment successfully :) Sure, I would like to get back to this and debug it better but I won’t find enough time for it in the next few weeks :(
Getting back to your question - each one of the layer has its own backward()
method. Here you can find an example for Dense
layer:
https://github.com/jpowie01/CUDA-DNN-MNIST/blob/master/src/layers/dense.cpp#L61
This method takes gradients from the ”upper” layer and return its own gradients. All of the flow is controlled by the Model
which computes initial loss and pass it though all layers. Here you can find a Sequential
model:
https://github.com/jpowie01/CUDA-DNN-MNIST/blob/master/src/models/sequential.cu#L26
What’s more, Model
knows which optimizer to use. In current implementation you can find SGD optimizer:
https://github.com/jpowie01/CUDA-DNN-MNIST/blob/master/src/optimizers/sgd.cpp
I hope this will help you if you would like to debug it on your own. Don’t hesitate to ask other questions! :)