axon Dropout with rate: 1 raises on binary backend

Sep 20 '22 22:09 seanmor5

@seanmor5 can you give me a short snippet to reproduce it or a failing test? :)

Sep 20 '22 23:09 josevalim

Sure, here's a failing inference:

model =
  Axon.input("data")
  |> Axon.dense(32)
  |> Axon.relu()
  |> Axon.dropout(rate: 1.0)

{init_fn, predict_fn} = Axon.build(model, mode: :train)

params = init_fn.(Nx.iota({1, 10}), %{})

predict_fn.(params, Nx.iota({1, 10}))

We get:

** (ArithmeticError) bad argument in arithmetic expression
    (complex 0.4.2) lib/complex.ex:579: Complex.divide/2
    (nx 0.3.0) lib/nx/binary_backend.ex:2453: Nx.BinaryBackend."-binary_to_binary/4-lbc$^6/2-11-"/5
    (nx 0.3.0) lib/nx/binary_backend.ex:673: Nx.BinaryBackend.element_wise_bin_op/4
    (nx 0.3.0) lib/nx/defn/evaluator.ex:199: Nx.Defn.Evaluator.eval_apply/4
    (nx 0.3.0) lib/nx/defn/evaluator.ex:81: Nx.Defn.Evaluator.eval/3
    (elixir 1.13.4) lib/enum.ex:1715: Enum."-map_reduce/3-lists^mapfoldl/2-0-"/3
    (elixir 1.13.4) lib/enum.ex:1715: Enum."-map_reduce/3-lists^mapfoldl/2-0-"/3
    (nx 0.3.0) lib/nx/defn/evaluator.ex:173: Nx.Defn.Evaluator.eval_apply/4

Sep 20 '22 23:09 seanmor5

We are doing a division by zero. We could return NaN but it may be a bug in Axon in this case?

Sep 21 '22 08:09 josevalim

Hey @seanmor5 , I've found where the division by zero is happening. It is here in the dropout definition: out = Nx.select(mask, input / keep_prob, Nx.tensor(0, type: Nx.type(input)))

I'm a bit confused here with how this is working. I understand that the mask uses keep_prob to lean towards keeping the value or not. However, on the second argument of that select I'm not clear on why is that division needed. If the mask is 1 for that element, shouldn't the element be preserved as it is?

Sep 30 '22 20:09 Ian-GL

@Ian-GL In most DL frameworks we scale the kept activations IOT preserve original input properties (mean, variance, etc.)

In this case 1.0 is definitely an unrealistic value so it's not surprising it fails. I'm just not sure if we should add a small epsilon to the denonminator or if we should raise if the value is >= 1

I will need to check the behavior of other frameworks

Sep 30 '22 20:09 seanmor5

@Ian-GL If you want to tackle this, Keras raises when the dropout rate is not on the interval [0, 1), and I think that's a sensible default!

Oct 13 '22 20:10 seanmor5

Sure, @seanmor5 , thanks for the guidance! I'll make a PR.

Oct 14 '22 03:10 Ian-GL

axon axon copied to clipboard

Dropout with rate: 1 raises on binary backend

axon
axon copied to clipboard