Varun Gumma
Varun Gumma
@AIikai @robotsp I am also looking to distill and prune a few LLMs. Any leads?
@robotsp @AIikai @HeegonJin please redirect [here](https://github.com/facebookresearch/fairseq/issues/4738) for KD in fairseq
@HeegonJin I have a basic implementation of KD in my repo [here](https://github.com/VarunGumma/fairseq) It is based on the implementation of https://github.com/LeslieOverfitting/selective_distillation They have a much older version of `fairseq`, and I...
Please use the latest version of my code and you can find an example of `knowledge_distillation_translation` in the examples folder. As this work is under progress, I make multiple bug...
@HeegonJin I use a custom model architecture which I defined in a file in that directory `$custom_model_dir`. If you are using models (parent and student) which are defined in `fairseq`...
> > @HeegonJin I a basic implementation of KD in my repo [here](https://github.com/VarunGumma/fairseq) It is based on the implementation of https://github.com/LeslieOverfitting/selective_distillation They have a much older version of `fairseq` and...
Will fairseq-v2 support Pytorch2.0?
any update on v2?
Just a dumb question. I am training a transformer model using `fairseq` and want to use `xformers`. Is it enough if I install `xformers` library in my environment and start...
Is Pytorch supporting python 3.11?