DiffKD icon indicating copy to clipboard operation
DiffKD copied to clipboard

Noisy_Adapter: the curves of average γ

Open omglet1 opened this issue 1 year ago • 1 comments

hi @hunto I have been following your work for a long time and I am very excited that the code has been made public in the target classification task. but I found a problem: when I use resnet-34(teacher) to train resnet-18(student) with B1 baseline setting on CIFAR-10 dataset, the curves of average γ can't match your result. The curve is close to 1 and can not descend. feature logit These two images are the γ of the noisy adapter on the feature KD and logit KD respectively!

omglet1 avatar Feb 28 '24 04:02 omglet1

Hi @omglet1 ,

I think the values of gamma are up to the task and model you used in KD. It is acknowledged that CIFAR-10 is a very easy dataset, and all the students and teachers can achieve ~100% training accuracies. So I think there might not be significant gaps between the teacher and student noises. When the gamma equals to 1, it means that no addtional noise should be added.

hunto avatar Mar 11 '24 02:03 hunto