Flux.jl icon indicating copy to clipboard operation
Flux.jl copied to clipboard

Implemented vectorized LRNorm

Open thebhatman opened this issue 6 years ago • 11 comments

Added function for Local Response Normalization. Reference: https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf. The previous PRs on adding LRNorm used for loops and indexing, due to which automatic differentiation was not supported. I have implemented a vectorized version of LRNorm.

thebhatman avatar Mar 30 '19 10:03 thebhatman

I have added tests for Local Response Normalisation.

thebhatman avatar Apr 04 '19 11:04 thebhatman

Travis CI details say there were 4 errored tests. But the tests I added for LRNorm passed. The errored tests are coming from iris dataset, if I am not wrong.

thebhatman avatar Apr 04 '19 12:04 thebhatman

@staticfloat I have made the requested changes. What kind of tests would be appropriate?

thebhatman avatar Apr 16 '19 11:04 thebhatman

I have added tests by comparing with tensorflow's LRN. The output values are approximately equal and comparable. So I have rounded off both outputs to two digits and then compared them.

thebhatman avatar Apr 16 '19 22:04 thebhatman

While I do expect some difference between the two implementations, we should be able to get a lot more than just 2 decimal places of accuracy. If we have to round with digits=2, that makes me think there might be something wrong with our algorithm. Can you show the actual TF output and the comparison with our algorithm?

staticfloat avatar Apr 16 '19 23:04 staticfloat

Here's the computed output by our function: Input is x = param(reshape(Float32[1:20;], 1, 1, 5, 4))

julia> m = LRNorm()
LRNorm(α = 0.0001,β = 0.75 ,n = 5,k = 2.0)
julia> m(x)
Tracked 1×1×5×4 Array{Float64,4}:
[:, :, 1, 1] =
 0.5942915817132711

[:, :, 2, 1] =
 1.1878710105259485

[:, :, 3, 1] =
 1.7801403935901126

[:, :, 4, 1] =
 2.3736092915813667

[:, :, 5, 1] =
 2.9674555452942113

[:, :, 1, 2] =
 3.547816324791048

[:, :, 2, 2] =
 4.126683180908961

[:, :, 3, 2] =
 4.698799923539177

[:, :, 4, 2] =
 5.29318127413301

[:, :, 5, 2] =
 5.891985429255813

[:, :, 1, 3] =
 6.436172025553092

[:, :, 2, 3] =
 6.971188482472584

[:, :, 3, 3] =
 7.49092944638354

[:, :, 4, 3] =
 8.102437972776501

[:, :, 5, 3] =
 8.72667895604141

[:, :, 1, 4] =
 9.214966343739228

[:, :, 2, 4] =
 9.665769310057835

[:, :, 3, 4] =
 10.092066777866355

[:, :, 4, 4] =
 10.748286820459093

[:, :, 5, 4] =
 11.430093727457718

And here is tensorflow LRN output for same input :

[[[[ 0.5945813]]

  [[ 1.1890287]]

  [[ 1.7832088]]

  [[ 2.3769882]]

  [[ 2.9702334]]]


 [[[ 3.5628128]]

  [[ 4.154593 ]]

  [[ 4.745444 ]]

  [[ 5.3352346]]

  [[ 5.923835 ]]]


 [[[ 6.5111175]]

  [[ 7.0969534]]

  [[ 7.681217 ]]

  [[ 8.263785 ]]

  [[ 8.844531 ]]]


 [[[ 9.423337 ]]

  [[10.000079 ]]

  [[10.574641 ]]

  [[11.146905 ]]

  [[11.716756 ]]]]

thebhatman avatar Apr 17 '19 05:04 thebhatman

There is actually a considerable difference in the output.

thebhatman avatar Apr 17 '19 05:04 thebhatman

Agreed; I suggest tracking down the tensorflow computation and seeing if you can figure out where the difference lies; I think this is more than just floating-point noise; I think there is an actual difference in algorithm.

staticfloat avatar Apr 17 '19 05:04 staticfloat

I was trying tensorflow's local_response_normalization for an input of all ones and track the way it is computing the normalization term. The parameters were : depth_radius n = (I varied it from 2 to 4. It is supposed to be the number of adjacent channels to sum over.), bias k = 0, alpha = 1, beta = 1. This ensures that the normalization term is sum of squares of activations in the local region of interest. The input was a batch of ones with width = height = 2, channels = 4, batch_size = 1. Now the output of tensorflow's lrn layer is the same for n = 2, 3, 4. The output was :

[[[[0.5 0.5]
   [0.5 0.5]]

  [[0.5 0.5]
   [0.5 0.5]]

  [[0.5 0.5]
   [0.5 0.5]]

  [[0.5 0.5]
   [0.5 0.5]]]]

Also when the input was a batch with width = height = 4, channels = 1, batch_size = 1. Now since num_channels is 1, LRNorm across channels is supposed to sum over just that channel itself for any n > 1. But tensorflow's lrn layer gives this output for n = 2:

[[[[0.33333334 0.25       0.25       0.33333334]
   [0.33333334 0.25       0.25       0.33333334]
   [0.33333334 0.25       0.25       0.33333334]
   [0.33333334 0.25       0.25       0.33333334]]]]

and for n = 3 and 4:

[[[[0.25 0.25 0.25 0.25]
   [0.25 0.25 0.25 0.25]
   [0.25 0.25 0.25 0.25]
   [0.25 0.25 0.25 0.25]]]]

I think this proves that tensorflow's lrn layer is normalizing within a channel by summing over the square of n adjacent activations(spatially) in the same channel. But the paper deals with lrn which sums over n adjacent activations across different channels at same spatial location. Please correct me if I have got it wrong.

thebhatman avatar May 02 '19 09:05 thebhatman

The two types of local response normalization : Across channels and within channels has been discussed here https://stats.stackexchange.com/questions/145768/importance-of-local-response-normalization-in-cnn

thebhatman avatar May 02 '19 09:05 thebhatman

I have kinda hit a wall here. Any suggestions would be highly appreciated.

thebhatman avatar May 14 '19 16:05 thebhatman