knowledge-distillation-pytorch
knowledge-distillation-pytorch copied to clipboard
I think I couldn't prove how cnn_distill has highter performance than base_cnn.
This is my situation. I trained base_cnn in advance using cifar10 dataset for comparing performance between base_cnn and cnn_distill.
Also, I trained base_resnet18 as a teacher using same dataset. Lastly, I trained cnn_distill using resnet18.
I got two accuracy which were 0.875 from base_cnn and 0.858 from cnn_distill in each metrics_val_best_weights.json. It looks like that base_cnn is better than cnn_distill.
I didn't change any param in base_cnn and cnn_distill except for one param which was augmentation value from 'no' to 'yes' in base_cnn's params.json.
I think there would be no reason to use knowledge-distillation if base_cnn had higher accuracy. Please let me know where I was wrong. Thanks for your time.
@K-Won I think if your base model is more complicated, then you can not get promotion. So I think you can try use a small model, and train it with distill or not, then I think it will different between the two model.