Varun Gumma comments

Results 17 comments of


                                            Varun Gumma

Is there any way to distill translation models?

@AIikai @robotsp I am also looking to distill and prune a few LLMs. Any leads?

Is there any way to distill translation models?

@robotsp @AIikai @HeegonJin please redirect [here](https://github.com/facebookresearch/fairseq/issues/4738) for KD in fairseq

Knowledge Distillation

@HeegonJin I have a basic implementation of KD in my repo [here](https://github.com/VarunGumma/fairseq) It is based on the implementation of https://github.com/LeslieOverfitting/selective_distillation They have a much older version of `fairseq`, and I...

Knowledge Distillation

Please use the latest version of my code and you can find an example of `knowledge_distillation_translation` in the examples folder. As this work is under progress, I make multiple bug...

Knowledge Distillation

@HeegonJin I use a custom model architecture which I defined in a file in that directory `$custom_model_dir`. If you are using models (parent and student) which are defined in `fairseq`...

Knowledge Distillation

> > @HeegonJin I a basic implementation of KD in my repo [here](https://github.com/VarunGumma/fairseq) It is based on the implementation of https://github.com/LeslieOverfitting/selective_distillation They have a much older version of `fairseq` and...

fairseq v2

Will fairseq-v2 support Pytorch2.0?

fairseq v2

any update on v2?

Assertion compares dimension of key_padding_mask with query dimension in xformers MHA

Just a dumb question. I am training a transformer model using `fairseq` and want to use `xformers`. Is it enough if I install `xformers` library in my environment and start...

Dataclass error while importing Fairseq in Python 3.11

Is Pytorch supporting python 3.11?