adversarial_training_methods
adversarial_training_methods copied to clipboard
The question about gradient in VAT
Hi, I have some questions about the gradient in VAT.
The functrion get_v_adv_loss(self, ul_batch, p_mult, power_iterations=1)
in paper_network.ipynb
has one statement:
gradient = tf.gradients(kl, [d], aggregation_method=tf.AggregationMethod.EXPERIMENTAL_ACCUMULATE_N)[0]
Is this used for calculating $g=\nabla_{s+d} KL[p(·|s;\hat{\theta})||p(·|s+d;\hat{\theta})]$ defined in Eq.(7) in the original paper?
But in the original paper, the author cal the gradient of KL to $s+d$, while what you did in your code was the gradient of KL to [d]
, why? Are they the same? Thanks!