ConcreteDropout icon indicating copy to clipboard operation
ConcreteDropout copied to clipboard

dropout_regularizer

Open XinDongol opened this issue 7 years ago • 2 comments

In the paper, entropy of a Bernoulli random variable is H(p) := -p * log(p) - (1-p) * log(1-p)

But in the code, dropout_regularizer was computed by dropout_regularizer = self.p * K.log(self.p) dropout_regularizer += (1. - self.p) * K.log(1. - self.p) dropout_regularizer *= self.dropout_regularizer * input_dim

Could you please explain the meaning of the dropout_regularizer *= self.dropout_regularizer * input_dim. I cannot find related equation of this code in your paper.

Thanks for your kind help in advance.

XinDongol avatar Dec 19 '17 02:12 XinDongol

I believe this is for scaling the lambda by the output shape, before applying it.

However I think @yaringal is the only one who can properly answer this.

Whilst re-reading the code I noticed the comment above input_dim mentions ignoring the final dimension yet slices to ignore the first dim? - is the comment or code wrong?

joeyearsley avatar Dec 19 '17 05:12 joeyearsley

Normal dropout upscales the feature vector by 1/(1-p) after dropping out units. We do the same, substituting W'=W/(1-p) into the model and KL calculations. Y

Edit (2019): see lines

kernel_regularizer = self.weight_regularizer * tf.reduce_sum(tf.square(weight)) / (1. - self.p)

and

retain_prob = 1. - self.p
x /= retain_prob

yaringal avatar Dec 19 '17 11:12 yaringal