Nasty-Teacher The asymmetry of KL divergence.

The asymmetry of KL divergence.

Open xuguodong03 opened this issue 3 years ago • 1 comments

Hi, I notice that in the 'train_nasty.py', when the KL divergence is computed, normal teacher's output (output_stu) is regarded as input and nasty teacher's output (output_tch) is regarded as target. However, in general KD, the fixed model (teacher) is usually regarded as the target and the model that needs update is regarded as the input.

I wonder why you adopt an opposite order in KL loss function. Is there any point here? Thanks!

Apr 08 '21 11:04 xuguodong03

Thanks for your asking. I am sorry for the ambiguous variable names. Nevertheless, the name of variables does not affect the results of our paper.

Since we aim to build a nasty teacher model, I just set the name of the outputs from the nasty teacher (the model we want to update) as output_tch (https://github.com/VITA-Group/Nasty-Teacher/blob/main/train_nasty.py#L56). I set it as "output_stu" simply because at the very beginning of this project, I tried to use a student network here and co-train them together, but later I found that this idea didn't work and I just kept the variable names the same for my other ideas.

Maybe I should change the name of the output from the fixed model (output_stu in https://github.com/VITA-Group/Nasty-Teacher/blob/main/train_nasty.py#L64) to output_adv to make things clear.

Apr 08 '21 17:04 HowieMa

Nasty-Teacher Nasty-Teacher copied to clipboard

The asymmetry of KL divergence.

Nasty-Teacher
Nasty-Teacher copied to clipboard