dnw icon indicating copy to clipboard operation
dnw copied to clipboard

ReLU as a literal switch

Open ghost opened this issue 4 years ago • 3 comments

I guess you can interpret a ReLU neural network as a protean graph of dot products since ReLU is a literal switch. https://github.com/S6Regen/Fixed-Filter-Bank-Neural-Networks

ghost avatar Oct 21 '19 08:10 ghost

Hi,

Sorry but I'm super confused about this issue, do you mind elaborating?

Thank you, Mitchell

mitchellnw avatar Oct 23 '19 22:10 mitchellnw

Nonspecific confusion is difficult to answer. Do you find it difficult to envision an ReLU as a switch? If you had studied some electronics you would understand switches and rectifiers quite naturally. You can break the behavior of ReLU into two parts. A literal on/off switch and then a trigger decision about when to throw the switch one way or the other. While a switch is binary on and off, its output also depends on its input (one to one basis). When on, 1 volt in gives 1 volt out, 5 volts in gives 5 volts out. There is a linear aspect. And off you get zero volts out.

What then about a conventional neural network with switches? If you replace all the activation functions with mechanical on/off switches what can you do? Throwing the switches this way and that gives you different linear projections from the input vector to the output vector. And the path from the input vector to the output of a particular neuron is composed of the weighted sum of some (determined by the switches) weighted sums.... Which really then can be boiled down to a single weighted sum of the input vector. The weighted sum (dot product) of a number of weighted sums can be reduced to a single weighted sum. It is known that a deep linear network can be reduced to an equivalent single layer linear network.

You then get to the question of how the switching trigger decision of ReLU interacts with all that. It switches on for inputs greater than or equal to zero, off for less than that. The weighted sum leading to a ReLU can be adjusted to make the thing a weak learner, the neurons in the next layer can boost weak decisions by combining them. The matter is much complicated by the fact that you trying to produce a particular output at the same time. Decision making and construction of the required output are being carried out in the same stream as it were.

 On Thursday, October 24, 2019, 6:42:42 AM GMT+8, mitchellnw <[email protected]> wrote:  

Hi,

Sorry but I'm super confused about this issue, do you mind elaborating?

Thank you, Mitchell

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

ghost avatar Oct 24 '19 00:10 ghost

I'll mention 2 or 3 other things because I don't want to spend a lot of time answering questions. A better switched function might the parameterized function f(x)=a.x x>=0, f(x)=b.x x<0. Even with conventional networks it would allow the system to organize ResNet like information pass through automatically (eg a=b=1.) Of course that would mean learning a and b parameters for each individual function as well as the normal weight parameters.

The other thing is there are very fast ways of doing certain sets of weighted sums (dot products) such as the FFT and the Walsh Hadamard transform (WHT.) Since they are distributive transforms (a change in a single input affects all the output values) they allow boosting. A fast random projection algorithm can also be formed by performing a predetermined random sign flipping of the elements of the input vector to a WHT. Basically you are xoring the random sign flips with the more organized sign flips in the WHT matrix. By including those things you can have faster and more flexible neural network arrangements. You have much more choices, random projections are useful for certain things like initial uniformity of variance of vector elements and 'fair' dimension reduction. The WHT on its own can replace n weighted sums reducing the computational burden from O(nn) to O(nln(n) while still allowing boosting of all the neuron output of the previous layer. You can also include the parameterized activation function. All in a way that is quite consistent in terms of the switching and weighted sum basis of conventional ReLU based artificial neural networks. A simple generalization. I can't see that there should be any objection.

 On Thursday, October 24, 2019, 8:35:33 AM GMT+8, Sean OConnor <[email protected]> wrote:  

Nonspecific confusion is difficult to answer. Do you find it difficult to envision an ReLU as a switch? If you had studied some electronics you would understand switches and rectifiers quite naturally. You can break the behavior of ReLU into two parts. A literal on/off switch and then a trigger decision about when to throw the switch one way or the other. While a switch is binary on and off, its output also depends on its input (one to one basis). When on, 1 volt in gives 1 volt out, 5 volts in gives 5 volts out. There is a linear aspect. And off you get zero volts out.

What then about a conventional neural network with switches? If you replace all the activation functions with mechanical on/off switches what can you do? Throwing the switches this way and that gives you different linear projections from the input vector to the output vector. And the path from the input vector to the output of a particular neuron is composed of the weighted sum of some (determined by the switches) weighted sums.... Which really then can be boiled down to a single weighted sum of the input vector. The weighted sum (dot product) of a number of weighted sums can be reduced to a single weighted sum. It is known that a deep linear network can be reduced to an equivalent single layer linear network.

You then get to the question of how the switching trigger decision of ReLU interacts with all that. It switches on for inputs greater than or equal to zero, off for less than that. The weighted sum leading to a ReLU can be adjusted to make the thing a weak learner, the neurons in the next layer can boost weak decisions by combining them. The matter is much complicated by the fact that you trying to produce a particular output at the same time. Decision making and construction of the required output are being carried out in the same stream as it were.

 On Thursday, October 24, 2019, 6:42:42 AM GMT+8, mitchellnw <[email protected]> wrote:  

Hi,

Sorry but I'm super confused about this issue, do you mind elaborating?

Thank you, Mitchell

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

ghost avatar Oct 24 '19 13:10 ghost