Zero-shot_Knowledge_Distillation_Pytorch About the training process

Thanks for your work and code. I have seen that you have given the weight of the teacher model trained on Cifar. Why does Readme write that the teacher should be retrained?

Aug 25 '21 11:08 Sharpiless

It's just option. You need not to retrained teacher. I just wanted to show that it is available to train the teacher in readme. I'm gonna change the thing what you said.

Aug 26 '21 04:08 da2so

Thanks for your reply. One more question, the original paper uses cross entropy loss while your code uses BCE loss. Is that the right thing to do?

Aug 26 '21 08:08 Sharpiless

There are also some implement details which are different from the original paper.

For mnist, the batch in paper is 512 and the lr in generator is 3.0 with 24k samples. The temperature should also be devided by outputs_S instead of outputs_T. Would you like to re-implement this repo?

Aug 26 '21 09:08 Sharpiless