Teacher-free-Knowledge-Distillation The baseline of ResNet18 on CIFAR100 is relatively lower

The baseline of ResNet18 on CIFAR100 is relatively lower

Open JosephChenHub opened this issue 4 years ago • 3 comments

Hi, I would first appreciate your work for interpreting the relationship between the KD and LSR. However, the baseline of ResNet18 on cifar100 is much lower than the implementation pytorch-cifar100, which may be caused by the modified ResNet. In fact, based on the pytorch-cifar100, without any extra augmentations, the top1 accuracy can achieve up to 78.05% in my previous experiments. So I would cast my doubt on the performance gain of the self distillation. And I have conducted an experiment using the distillation, which improves the baseline from 77.96% to 78.45%. It does improve the performance yet not conspicuous as the paper claimed.

Sep 24 '20 09:09 JosephChenHub

Hi,

Q. "In fact, based on the pytorch-cifar100, without any extra augmentations, the top1 accuracy can achieve up to 78.05% in my previous experiments." A: I also try this repo, but same as it, ResNet18 only achieve around 76%, similar with our paper. The following is the results of pytorch-cifar100, in which ResNet18 achieved 75.61% but not 78%.

Q. "And I have conducted an experiment using the distillation, which improves the baseline from 77.96% to 78.45%." A: Did you tune your hyper-parameters when using the distillation, because if you only try some hyper-parameters, it's normal that the improvement is not significant.

By the way, we don't use extra augmentations for our method, it still a fair comparison that we also don't use extra augmentations in baseline (original KD or LSR).

Sep 30 '20 03:09 yuanli2333

Hi, here is my training log, and you can reproduce the result using the repo. , which achieves ~78.05% top-1 accuracy without extra augmentations. I think the distillation does work yet not conspicuous, and it could only improve about 0.5% in my setting.

Oct 13 '20 02:10 JosephChenHub

Hi, your implementation is different with the original pytorch-cifar100, the original pytorch-cifar100 can not achieve ~78.05% top1 accuracy. About the improvement by our method, the improvement also depends on your hyper-parameters, and I also don't know if you search the hyper-parameters or not, so it is normal improve about 0.5% by your implementation.

Oct 14 '20 08:10 yuanli2333

Teacher-free-Knowledge-Distillation Teacher-free-Knowledge-Distillation copied to clipboard

The baseline of ResNet18 on CIFAR100 is relatively lower

Teacher-free-Knowledge-Distillation
Teacher-free-Knowledge-Distillation copied to clipboard