spiking_relu_conversion icon indicating copy to clipboard operation
spiking_relu_conversion copied to clipboard

ReLU in backprop

Open Arech opened this issue 9 years ago • 2 comments

Hi Dan! Thanks for sharing your code. I was trying to add ReLU support to DeepLearningToolbox when I found this repository. Would you be so kind to help me with my questions?

  1. I'm looking at the backprop realization for ReLU and can't understand one thing about handling ReLU at output layer, see line 11 at nnbp.m. Looks like you decided to set the derivative of Loss-function w.r.t. total input of ReLU-unit in output layer to the same formula, as expected for simple linear unit, ignoring the fact that ReLU's derivative is zero for negative input. Shouldn't the ReLU in output layer have a special handling, as it does in hidden layers? Something like
case 'relu'
    d{n} = - nn.e .* single(nn.a{n} > 0);

instead of

case {'softmax','linear', 'relu'}
        d{n} = - nn.e;
  1. Anyway, with the help of your code, I added ReLU support to clean DeepLearningToolbox instance. Then I tried to test gradients for numerical correction by updating test_nn_gradients_are_numerically_correct() to include relu in tests. But then the test started to throw assertion violation in nnchecknumgrad() with 'relu' activation and output 'linear'. Mere raising error threshold in 10 times more didn't help so I wonder why is that? I checked the relu realization multiple times, but it look good (BTW: that's when I found the reason for the first question, but it doesn't matter in this question, because output is linear unit) Did you tried to update your version of test_nn_gradients_are_numerically_correct() to test ReLU?

  2. Just curious, why did you remove bias units from NN code? Biases significantly enhances neuron ability to discriminate various patterns. What's the point to remove them?

Thanks!

Arech avatar Mar 14 '15 12:03 Arech

Hi there!

  1. First, you should be aware that this code was put together very quickly for a paper deadline, and that the main focus of the work is on a spiking neural network implementation (which also is relevant for your point #3). So you're likely to find some small bugs in it up until the work is accepted for publication.

It is true that the output function should be different for a correct implementation of normal backprop. Normally, for the final output layer you would use a softmax, which has been shown to be a better choice for classification layers. However, since we were targeting a particular spiking implementation, have a unified neuron model across all the layers was important to us. You generally shouldn't use a relu in the output layer unless you have a constraint like us. I'll retest the code, though, after fixing.

  1. I did not fix the gradient checking yet, so that's probably the issue.

  2. For a spiking neuron implementation, biases can be implemented either as an injection current or a background spikerate. Both are costly to implement for an event-driven implementation - which is where we would like to go with this - so before delving into that additional complexity, we're focusing on implementations without biases. Later on, if we need to add it, we can add it back in. Even without biases, we can achieve >99.2% accuracy on MNIST, so it's only necessary for that last bit.

I presume you are asking because you are interested in applying the well-performing ReLUs to a traditional deep learning dataset. While you can definitely use this code for that, just understand that the focus of this work was on something different, so it might not be optimized for it. I think you've found the primary differences between our goal and traditional ML goals (no biases and a unified neuron model across all layers), but there might be other things.

dannyneil avatar Mar 18 '15 15:03 dannyneil

Hi! I think, I was able to fix the gradient checking and find the correct implementation of ReLU. The only change needed is to indeed set the correct output layer derivative for the ReLU (let's omit the fact ReLU isn't good for that as you've written). But if you would do that alone, it won't work)) I've spent a quite amount of time trying to find what's wrong with it until I changed floating point type from single precision to double. So, the correct implementation is, I think, the following: For the output layer

switch nn.output
...
    case 'relu'
        d{n} = - nn.e .* double(nn.a{n} > 0);

and for hidden layers:

switch nn.activation_function
...
    case 'relu'
        d_act = double(nn.a{i} > 0);

This changes makes a numeric gradient checking happy. I didn't make a pull into DeepLearnToolbox repository though... I'm not very familiar with GitHub, so may be someone else might want to do it.

Wish you luck with a paper. Please, don't forget to post a link to it somewhere, I'm very interested in neural nets)

Arech avatar Mar 19 '15 11:03 Arech