vat_tf
vat_tf copied to clipboard
Maybe a wrong loss function
Hello, Takeru, Thanks for your great works. I find that there may be an error with your code, at the 46th line in vat.py: " dist = L.kl_divergence_with_logit(logit_p, logit_m)", where I think you may add a negative sign before KL_divergence, because here we want to maximize the distance to get the virtual adversarial direction. Am I right?
Hi,
It's not wrong. The "positive" gradient of dist
is the direction whose maximizes dist
(i.e. KL divergence).
I speculate that you are confused it with the gradient descent algorithm, in which we add "negative" gradient to a variable.
Hi, Takeru,
Thanks for your kind reply, I was indeed confused it with the GD algorithm.
But I still have another question as following: according to the code, we know that the 'd' is randomly initialized first, and the gradient 'grad' on current 'd' is calculated, then this gradient 'grad' is taken as the 'r_vadv'. While my point is that we should take the 'd+grad' as 'r_vadv', because the summation of these two vector is actually the adversarial direction against the current sample x. Do you think so?
Looking forward to your reply, thanks again!
Right, that would be another option for estimating the adversarial perturbation and might improve the performance. The code is the implementation of the power iteration method which we use to estimate the most vulnerable direction. See Section 3.3 in the paper https://arxiv.org/pdf/1704.03976.pdf.