nntrainer
nntrainer copied to clipboard
[bn layer] derivative of deviation
When it comes to think of the derivative of deviation(2) there are 2 incoming derivative. One from the variance(3) and the other from intermediate output(5). These 2 incoming derivative should be merged and the be averaged. But current implementation averaged the derivative from intermediate output(5) and then subtract(merge) the other(derivative from variance). I think this might leads to different result.
Please check following diagram.
(1) input
(2) deviation
(3) variance
(4) inv_std
(5) intermediate output
(6) gamma
(7) beta
(8) output
:octocat: cibot: Thank you for posting issue #1977. The person in charge will reply soon.