CoFiPruning
CoFiPruning copied to clipboard
Why prepruning distillation?
Hi, I have a question about the intuition behind the prepruning distillation step. Why are you not initializing the student model from the teacher weights, instead of initializing it from scratch (/pretrained on MLM BERT checkpoint)?