Deep-Mutual-Learning
Deep-Mutual-Learning copied to clipboard
why the baseline accuracy is much less than regular resnet on cifar100?
https://github.com/weiaicunzai/pytorch-cifar100 resnet34 got 23.24 error rate and much higher in self distillation https://github.com/luanyunteng/pytorch-be-your-own-teacher
@JiyueWang Maybe there are some hyperparameters or implemented details are different. This repo achieve the considerable performance compared with the DML paper, so I didn't pay more attention on the architecture or tuning parameters. Personally,I suggest you can focus more on the idea of DML rather than network. If you really care about this problem or have strong interest in this work, I strongly recommended you to use DML in your own task and dataset.
Even so, welcome to tell me if you solve this problem or you have some other discovery.