Zero-shot_Knowledge_Distillation_Pytorch icon indicating copy to clipboard operation
Zero-shot_Knowledge_Distillation_Pytorch copied to clipboard

About the training process

Open Sharpiless opened this issue 4 years ago • 3 comments

Thanks for your work and code. I have seen that you have given the weight of the teacher model trained on Cifar. Why does Readme write that the teacher should be retrained?

Sharpiless avatar Aug 25 '21 11:08 Sharpiless

It's just option. You need not to retrained teacher. I just wanted to show that it is available to train the teacher in readme. I'm gonna change the thing what you said.

da2so avatar Aug 26 '21 04:08 da2so

Thanks for your reply. One more question, the original paper uses cross entropy loss while your code uses BCE loss. Is that the right thing to do?

Sharpiless avatar Aug 26 '21 08:08 Sharpiless

There are also some implement details which are different from the original paper.

For mnist, the batch in paper is 512 and the lr in generator is 3.0 with 24k samples. The temperature should also be devided by outputs_S instead of outputs_T. Would you like to re-implement this repo?

Sharpiless avatar Aug 26 '21 09:08 Sharpiless