blocks icon indicating copy to clipboard operation
blocks copied to clipboard

Support Recurrent BatchNormalization with @recurrent

Open dwf opened this issue 8 years ago • 2 comments

From @cooijmanstim:

I tried implementing it [recurrent batch normalization] in blocks at some point, but it became a mess because the batch statistics are in the inner graph so you can't use them directly to update the population statistics. AFAIK you would have to include them as scan outputs and it would break the whole @recurrent mechanism because whoever calls the recurrent apply (e.g. RecurrentStack) would have to be aware of the extra outputs. I don't understand how they do it in the Keras code.

This seems like it could be fixed with some extension of the @recurrent mechanism. I don't know enough about it though.

dwf avatar Apr 15 '16 21:04 dwf

One idea: is it possible to update an external shared variable from within the scan inner graph? You could have an update that stashes the batch statistics in an external shared variable, and make sure that update runs before the population updates (@nouiz might know if this is viable).

dwf avatar Apr 15 '16 21:04 dwf

I suppose the update attribute might be useful here, but a complicating factor is that at each timestep a different shared variable or a different subtensor of the shared variable should be updated.

cooijmanstim avatar Apr 15 '16 22:04 cooijmanstim