Flux.jl
Flux.jl copied to clipboard
Implemented vectorized LRNorm
Added function for Local Response Normalization. Reference: https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf. The previous PRs on adding LRNorm used for loops and indexing, due to which automatic differentiation was not supported. I have implemented a vectorized version of LRNorm.
I have added tests for Local Response Normalisation.
Travis CI details say there were 4 errored tests. But the tests I added for LRNorm passed. The errored tests are coming from iris dataset, if I am not wrong.
@staticfloat I have made the requested changes. What kind of tests would be appropriate?
I have added tests by comparing with tensorflow's LRN. The output values are approximately equal and comparable. So I have rounded off both outputs to two digits and then compared them.
While I do expect some difference between the two implementations, we should be able to get a lot more than just 2 decimal places of accuracy. If we have to round with digits=2, that makes me think there might be something wrong with our algorithm. Can you show the actual TF output and the comparison with our algorithm?
Here's the computed output by our function:
Input is x = param(reshape(Float32[1:20;], 1, 1, 5, 4))
julia> m = LRNorm()
LRNorm(α = 0.0001,β = 0.75 ,n = 5,k = 2.0)
julia> m(x)
Tracked 1×1×5×4 Array{Float64,4}:
[:, :, 1, 1] =
0.5942915817132711
[:, :, 2, 1] =
1.1878710105259485
[:, :, 3, 1] =
1.7801403935901126
[:, :, 4, 1] =
2.3736092915813667
[:, :, 5, 1] =
2.9674555452942113
[:, :, 1, 2] =
3.547816324791048
[:, :, 2, 2] =
4.126683180908961
[:, :, 3, 2] =
4.698799923539177
[:, :, 4, 2] =
5.29318127413301
[:, :, 5, 2] =
5.891985429255813
[:, :, 1, 3] =
6.436172025553092
[:, :, 2, 3] =
6.971188482472584
[:, :, 3, 3] =
7.49092944638354
[:, :, 4, 3] =
8.102437972776501
[:, :, 5, 3] =
8.72667895604141
[:, :, 1, 4] =
9.214966343739228
[:, :, 2, 4] =
9.665769310057835
[:, :, 3, 4] =
10.092066777866355
[:, :, 4, 4] =
10.748286820459093
[:, :, 5, 4] =
11.430093727457718
And here is tensorflow LRN output for same input :
[[[[ 0.5945813]]
[[ 1.1890287]]
[[ 1.7832088]]
[[ 2.3769882]]
[[ 2.9702334]]]
[[[ 3.5628128]]
[[ 4.154593 ]]
[[ 4.745444 ]]
[[ 5.3352346]]
[[ 5.923835 ]]]
[[[ 6.5111175]]
[[ 7.0969534]]
[[ 7.681217 ]]
[[ 8.263785 ]]
[[ 8.844531 ]]]
[[[ 9.423337 ]]
[[10.000079 ]]
[[10.574641 ]]
[[11.146905 ]]
[[11.716756 ]]]]
There is actually a considerable difference in the output.
Agreed; I suggest tracking down the tensorflow computation and seeing if you can figure out where the difference lies; I think this is more than just floating-point noise; I think there is an actual difference in algorithm.
I was trying tensorflow's local_response_normalization for an input of all ones and track the way it is computing the normalization term. The parameters were : depth_radius n = (I varied it from 2 to 4. It is supposed to be the number of adjacent channels to sum over.), bias k = 0, alpha = 1, beta = 1. This ensures that the normalization term is sum of squares of activations in the local region of interest. The input was a batch of ones with width = height = 2, channels = 4, batch_size = 1. Now the output of tensorflow's lrn layer is the same for n = 2, 3, 4. The output was :
[[[[0.5 0.5]
[0.5 0.5]]
[[0.5 0.5]
[0.5 0.5]]
[[0.5 0.5]
[0.5 0.5]]
[[0.5 0.5]
[0.5 0.5]]]]
Also when the input was a batch with width = height = 4, channels = 1, batch_size = 1. Now since num_channels is 1, LRNorm across channels is supposed to sum over just that channel itself for any n > 1. But tensorflow's lrn layer gives this output for n = 2:
[[[[0.33333334 0.25 0.25 0.33333334]
[0.33333334 0.25 0.25 0.33333334]
[0.33333334 0.25 0.25 0.33333334]
[0.33333334 0.25 0.25 0.33333334]]]]
and for n = 3 and 4:
[[[[0.25 0.25 0.25 0.25]
[0.25 0.25 0.25 0.25]
[0.25 0.25 0.25 0.25]
[0.25 0.25 0.25 0.25]]]]
I think this proves that tensorflow's lrn layer is normalizing within a channel by summing over the square of n adjacent activations(spatially) in the same channel. But the paper deals with lrn which sums over n adjacent activations across different channels at same spatial location. Please correct me if I have got it wrong.
The two types of local response normalization : Across channels and within channels has been discussed here https://stats.stackexchange.com/questions/145768/importance-of-local-response-normalization-in-cnn
I have kinda hit a wall here. Any suggestions would be highly appreciated.