ConcreteDropout
ConcreteDropout copied to clipboard
Inconsistency of your code and paper.
I read your paper Concrete Dropout. I find an inconsistency of your code and paper. The regularizer of kernel matrix should be proportional to 1-p. (Eq.(3) of your paper) But in your code, it is inversely proportional to 1-p.
kernel_regularizer = self.weight_regularizer * K.sum(K.square(weight)) / (1. - self.p)
I am not sure whether I misunderstand your paper or code.
that's because we reparametrise Wz
(with z~Bern(p)^K
) as Wz/(1-p)
for it to have mean W
. Then K.square(weight)
has an added term 1/(1-p)^2
which cancels out the 1-p
, giving 1/(1-p)
.
Could you please give some information how to derive equ(3) in this paper?
Could you please give some information how to derive equ(3) in this paper?
XinDongol its Proposition 1 of Dropout as a Bayesian Approximation Appendix 1 (Gal's previous paper )