Teacher-Assistant-Knowledge-Distillation the performance of plain10 and plain2 on cifar100

the performance of plain10 and plain2 on cifar100

Open cotyyang opened this issue 2 years ago • 0 comments

When I reproduce the performance of plain10 and plain2 on cifar100, I did a lot of experiments and couldn't reach the accuracy of Figure 4(b) in your paper. Therefore, I carefully read the experimental setup and found that We also used weight decay with the value of 0.0001 for training ResNets in your paper, so, can you tell me the specific method,for example weight-decay,learning-rate and crop=true or false.

Apr 29 '22 04:04 cotyyang

Teacher-Assistant-Knowledge-Distillation Teacher-Assistant-Knowledge-Distillation copied to clipboard

the performance of plain10 and plain2 on cifar100

Teacher-Assistant-Knowledge-Distillation
Teacher-Assistant-Knowledge-Distillation copied to clipboard