ConcreteDropout
ConcreteDropout copied to clipboard
dropout_regularizer
In the paper, entropy of a Bernoulli random variable is
H(p) := -p * log(p) - (1-p) * log(1-p)
But in the code, dropout_regularizer was computed by
dropout_regularizer = self.p * K.log(self.p)
dropout_regularizer += (1. - self.p) * K.log(1. - self.p)
dropout_regularizer *= self.dropout_regularizer * input_dim
Could you please explain the meaning of the dropout_regularizer *= self.dropout_regularizer * input_dim
. I cannot find related equation of this code in your paper.
Thanks for your kind help in advance.
I believe this is for scaling the lambda by the output shape, before applying it.
However I think @yaringal is the only one who can properly answer this.
Whilst re-reading the code I noticed the comment above input_dim
mentions ignoring the final dimension yet slices to ignore the first dim? - is the comment or code wrong?
Normal dropout upscales the feature vector by 1/(1-p) after dropping out units. We do the same, substituting W'=W/(1-p) into the model and KL calculations. Y
Edit (2019): see lines
kernel_regularizer = self.weight_regularizer * tf.reduce_sum(tf.square(weight)) / (1. - self.p)
and
retain_prob = 1. - self.p
x /= retain_prob