Teacher-Assistant-Knowledge-Distillation
Teacher-Assistant-Knowledge-Distillation copied to clipboard
the performance of plain10 and plain2 on cifar100
When I reproduce the performance of plain10 and plain2 on cifar100, I did a lot of experiments and couldn't reach the accuracy of Figure 4(b) in your paper. Therefore, I carefully read the experimental setup and found that We also used weight decay with the value of 0.0001 for training ResNets in your paper, so, can you tell me the specific method,for example weight-decay,learning-rate and crop=true or false.