icefall
icefall copied to clipboard
Zipformer MVQ
This PR makes training with knowledge distillation an option in the Zipformer recipe. The knowledge distillation method is MVQ-KD.
The teacher targets can be downloaded via the following command:
./distillation_with_hubert.sh --stage 2 --stop_stage 2
To turn on knowledge distillation, you will need to set --enable-distillation True. It is applicable to both streaming and non-streaming Zipformers.
Detailed results will follow.
Some results:
100 hours:
| model | test-clean | test-other |
|---|---|---|
| baseline, epoch-30-avg-9 | 5.97 | 15.73 |
| + mvq, epoch-30-avg-9 | 5.13 | 13.08 |
960 hours:
| model | test-clean | test-other |
|---|---|---|
| baseline, epoch-30-avg-9 | 2.25 | 5.06 |
| + mvq, epoch-30-avg-9 | 2.18 | 4.86 |