CuArrays.jl Add correct batch size for RNN hidden layer

trafficstars

Fixes https://github.com/FluxML/Flux.jl/issues/1114

The context here is that on the first call to the layer (also simulated via calling the Flux.reset! on the structure), the gradients for the hidden layer were returned as the same size as the state at the time (a vector), whereas on stepping through time, the size of the state would change. This needs to be accounted for while sending the correct size back. We do this automatically for subsequent calls, only with the first run, is when we see this issue.

@MikeInnes @maleadt, would this be an alright fix?

May 05 '20 11:05 DhairyaLGandhi

Maybe add a test?

May 05 '20 11:05 maleadt

As I understand it, ho should already have been expanded (via hbatch) during the forwards pass. Its gradient dho should have the same size as ho, in which case AFAICT this patch shouldn't be needed.

I don't think this PR would do any harm, but that seems fishy and it may be worth understanding why it happens in case something else is going wrong.

May 05 '20 11:05 MikeInnes

https://github.com/JuliaGPU/CuArrays.jl/blob/eac33f8f383f473c31bd879611fad15ee56671bb/src/dnn/rnn.jl#L188 and others is where that expansion happens (for h). But that doesn't happen in the forwards pass...

This is just to apply the same treatment as h to dho.

May 05 '20 11:05 DhairyaLGandhi

I believe that line mirrors this one which does happen in the forward pass (we just do it again in the backwards pass because we don't have access to the expanded h).

It would be worth printing out the size of ho and dho at the start of the pullback. Pullbacks are supposed to be able to rely on them being the same size (and if they are then there's no issue).

May 05 '20 11:05 MikeInnes

Found this

(size(ho), size(dho)) = ((8, 10), (8,))

This was when the model was reset!d

All subsequent calls have the same size. I am assuming its because h is a vector that we get a vector back? here https://github.com/JuliaGPU/CuArrays.jl/blob/eac33f8f383f473c31bd879611fad15ee56671bb/src/dnn/rnn.jl#L185

May 05 '20 11:05 DhairyaLGandhi

What should the test actually be? Just calling into the backwardData function?

May 05 '20 11:05 DhairyaLGandhi

If h is a vector then dh should also be a vector; I'm not sure if we handle that correctly. However, it seems likely to be independent of the current issue.

The fact that size(dho) != size(ho) implies that some other function is getting called with a matrix (ho) but producing a vector (dho) as a gradient, which is wrong. Ideally we'd fix that function, whatever it is.

May 05 '20 12:05 MikeInnes

@DhairyaLGandhi @MikeInnes Have you had a moment to look at the above?

Using the following latest release, the issue seems not to be reproducible anymore:

  [3a865a2d] CuArrays v2.2.2
  [a93c6f00] DataFrames v0.21.2
  [587475ba] Flux v0.10.4
  [28b8d3ca] GR v0.50.1
  [91a5bcdd] Plots v1.4.3
  [2913bbd2] StatsBase v0.33.0
  [e88e6eb3] Zygote v0.4.22

However, I couldn't identify what change to either Flux, CuArrays or Zygote may have solved the issue. Any pointer would be welcome.

Jun 23 '20 06:06 jeremiedb

Am I right in understanding that the mwe in FluxML/Flux.jl#1114 works as expected now?

Jun 23 '20 06:06 DhairyaLGandhi

Yes!

Jun 23 '20 06:06 jeremiedb

Wonder if the Adapt issue is related then, incorrect size in the backwards pass could result from an ndims of a Transpose or similar.

Jun 23 '20 06:06 DhairyaLGandhi

Ref JuliaGPU/Adapt.jl#24, I suppose

Jun 23 '20 06:06 DhairyaLGandhi

CuArrays.jl CuArrays.jl copied to clipboard

Add correct batch size for RNN hidden layer

CuArrays.jl
CuArrays.jl copied to clipboard