Heegon Jin comments

Results 6 comments of


Heegon Jin

Is there any way to distill translation models?

@AIikai @robotsp @VarunGumma I am also trying to implement KD with the modification of fairseq

Knowledge Distillation

@VarunGumma I am also trying to implement KD with the modification of fairseq

Knowledge Distillation

@VarunGumma Nice work! It might be even better if it could use attention-based distillation such as tinyBERT and MiniLM. I would try on those. Thanks.

@VarunGumma Hello, I tried to run your code after "pip install --editable ./" but it tells "fairseq-train: error: unrecognized arguments: --distillation-strategy batch_level --distillation-rate 0.5 --temperature 2.5 --temperature-schedule none --alpha-kd 5"

Knowledge Distillation

@VarunGumma Could you please give a little more details about the dataset you used and an opt "--user-dir $custom_model_dir"?

Knowledge Distillation

> @HeegonJin I a basic implementation of KD in my repo [here](https://github.com/VarunGumma/fairseq) It is based on the implementation of https://github.com/LeslieOverfitting/selective_distillation They have a much older version of `fairseq` and I...