Noisy_Adapter: the curves of average γ
hi @hunto
I have been following your work for a long time and I am very excited that the code has been made public in the target classification task.
but I found a problem: when I use resnet-34(teacher) to train resnet-18(student) with B1 baseline setting on CIFAR-10 dataset, the curves of average γ can't match your result. The curve is close to 1 and can not descend.
These two images are the γ of the noisy adapter on the feature KD and logit KD respectively!
Hi @omglet1 ,
I think the values of gamma are up to the task and model you used in KD. It is acknowledged that CIFAR-10 is a very easy dataset, and all the students and teachers can achieve ~100% training accuracies. So I think there might not be significant gaps between the teacher and student noises. When the gamma equals to 1, it means that no addtional noise should be added.