GRAND
GRAND copied to clipboard
The comparison of R(Do) and R(Dn)
In the paper, you come to the conclusion that dropout’s regularization term is the upper bound of DropNode’s with the Cauchy-Schwarz Inequality in 3.3. But the two fomulas are like (a1b1+a2b2+…+anbn)^2 and a1^2b1^2 +a2^2b2^2+…+an^2bn^2. The latter is not definitely larger than the former one. Is it correct?
In the paper, you come to the conclusion that dropout’s regularization term is the upper bound of DropNode’s with the Cauchy-Schwarz Inequality in 3.3. But the two fomulas are like (a1b1+a2b2+…+anbn)^2 and a1^2b1^2 +a2^2b2^2+…+an^2bn^2. The latter is not definitely larger than the former one. Is it correct?
Many thanks for your interests and comments! We have checked this equation carefully, and found that this is actually a mistake. Thus Dropout does not have the similar regularization effect with DropNode, and it's only an adaptive L2 regularization. We will correct the statement and update a new version to arXiv. If you have other questions, please feel free to contact us. Thanks very much!