Flux.jl Implemented vectorized LRNorm

Added function for Local Response Normalization. Reference: https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf. The previous PRs on adding LRNorm used for loops and indexing, due to which automatic differentiation was not supported. I have implemented a vectorized version of LRNorm.

Mar 30 '19 10:03 thebhatman

I have added tests for Local Response Normalisation.

Apr 04 '19 11:04 thebhatman

Travis CI details say there were 4 errored tests. But the tests I added for LRNorm passed. The errored tests are coming from iris dataset, if I am not wrong.

Apr 04 '19 12:04 thebhatman

@staticfloat I have made the requested changes. What kind of tests would be appropriate?

Apr 16 '19 11:04 thebhatman

I have added tests by comparing with tensorflow's LRN. The output values are approximately equal and comparable. So I have rounded off both outputs to two digits and then compared them.

Apr 16 '19 22:04 thebhatman

While I do expect some difference between the two implementations, we should be able to get a lot more than just 2 decimal places of accuracy. If we have to round with digits=2, that makes me think there might be something wrong with our algorithm. Can you show the actual TF output and the comparison with our algorithm?

Apr 16 '19 23:04 staticfloat

Here's the computed output by our function: Input is x = param(reshape(Float32[1:20;], 1, 1, 5, 4))

julia> m = LRNorm()
LRNorm(α = 0.0001,β = 0.75 ,n = 5,k = 2.0)
julia> m(x)
Tracked 1×1×5×4 Array{Float64,4}:
[:, :, 1, 1] =
 0.5942915817132711

[:, :, 2, 1] =
 1.1878710105259485

[:, :, 3, 1] =
 1.7801403935901126

[:, :, 4, 1] =
 2.3736092915813667

[:, :, 5, 1] =
 2.9674555452942113

[:, :, 1, 2] =
 3.547816324791048

[:, :, 2, 2] =
 4.126683180908961

[:, :, 3, 2] =
 4.698799923539177

[:, :, 4, 2] =
 5.29318127413301

[:, :, 5, 2] =
 5.891985429255813

[:, :, 1, 3] =
 6.436172025553092

[:, :, 2, 3] =
 6.971188482472584

[:, :, 3, 3] =
 7.49092944638354

[:, :, 4, 3] =
 8.102437972776501

[:, :, 5, 3] =
 8.72667895604141

[:, :, 1, 4] =
 9.214966343739228

[:, :, 2, 4] =
 9.665769310057835

[:, :, 3, 4] =
 10.092066777866355

[:, :, 4, 4] =
 10.748286820459093

[:, :, 5, 4] =
 11.430093727457718

And here is tensorflow LRN output for same input :

[[[[ 0.5945813]]

  [[ 1.1890287]]

  [[ 1.7832088]]

  [[ 2.3769882]]

  [[ 2.9702334]]]


 [[[ 3.5628128]]

  [[ 4.154593 ]]

  [[ 4.745444 ]]

  [[ 5.3352346]]

  [[ 5.923835 ]]]


 [[[ 6.5111175]]

  [[ 7.0969534]]

  [[ 7.681217 ]]

  [[ 8.263785 ]]

  [[ 8.844531 ]]]


 [[[ 9.423337 ]]

  [[10.000079 ]]

  [[10.574641 ]]

  [[11.146905 ]]

  [[11.716756 ]]]]

Apr 17 '19 05:04 thebhatman

There is actually a considerable difference in the output.

Apr 17 '19 05:04 thebhatman

Agreed; I suggest tracking down the tensorflow computation and seeing if you can figure out where the difference lies; I think this is more than just floating-point noise; I think there is an actual difference in algorithm.

Apr 17 '19 05:04 staticfloat

I was trying tensorflow's local_response_normalization for an input of all ones and track the way it is computing the normalization term. The parameters were : depth_radius n = (I varied it from 2 to 4. It is supposed to be the number of adjacent channels to sum over.), bias k = 0, alpha = 1, beta = 1. This ensures that the normalization term is sum of squares of activations in the local region of interest. The input was a batch of ones with width = height = 2, channels = 4, batch_size = 1. Now the output of tensorflow's lrn layer is the same for n = 2, 3, 4. The output was :

[[[[0.5 0.5]
   [0.5 0.5]]

  [[0.5 0.5]
   [0.5 0.5]]

  [[0.5 0.5]
   [0.5 0.5]]

  [[0.5 0.5]
   [0.5 0.5]]]]

Also when the input was a batch with width = height = 4, channels = 1, batch_size = 1. Now since num_channels is 1, LRNorm across channels is supposed to sum over just that channel itself for any n > 1. But tensorflow's lrn layer gives this output for n = 2:

[[[[0.33333334 0.25       0.25       0.33333334]
   [0.33333334 0.25       0.25       0.33333334]
   [0.33333334 0.25       0.25       0.33333334]
   [0.33333334 0.25       0.25       0.33333334]]]]

and for n = 3 and 4:

[[[[0.25 0.25 0.25 0.25]
   [0.25 0.25 0.25 0.25]
   [0.25 0.25 0.25 0.25]
   [0.25 0.25 0.25 0.25]]]]

I think this proves that tensorflow's lrn layer is normalizing within a channel by summing over the square of n adjacent activations(spatially) in the same channel. But the paper deals with lrn which sums over n adjacent activations across different channels at same spatial location. Please correct me if I have got it wrong.

May 02 '19 09:05 thebhatman

The two types of local response normalization : Across channels and within channels has been discussed here https://stats.stackexchange.com/questions/145768/importance-of-local-response-normalization-in-cnn

May 02 '19 09:05 thebhatman

I have kinda hit a wall here. Any suggestions would be highly appreciated.

May 14 '19 16:05 thebhatman

Flux.jl Flux.jl copied to clipboard

Implemented vectorized LRNorm

Flux.jl
Flux.jl copied to clipboard