Varun Gumma comments

Results 26 comments of


                                            Varun Gumma

Dataclass error while importing Fairseq in Python 3.11

@EmreOzkose Yes, Pytorch2.0 supports Python 3.11

Dataclass error while importing Fairseq in Python 3.11

Alternately, you can use [my fork of fairseq](https://github.com/VarunGumma/fairseq) which supports `Python 3.11`, Knowledge Distillation, Adapters a few more interesting fixes.

Error restarting training and inability to enable checkpoint activations after compiling Fairseq model using torch.compile()

Hi, when you use ```torch.compile```, do you get a bunch of logging messages? I tried adding ```torch.compile``` exacty the same way you did, but my terminal is flooded with warnings...

Error restarting training and inability to enable checkpoint activations after compiling Fairseq model using torch.compile()

@santha96 did you just leave the logging messages like that, or were you able to suppress them?

Add support for Pfeiffer adapters to fairseq models

Hi @bhavitvyamalik, I have a [clone of fairseq](https://github.com/VarunGumma/fairseq) that implements adapters. Feel free to use it, and if you face any issues or want more features, open a pull request,...

Modifying ALiBi for Encoder-Attention or Cross-Attention

Hi @EIFY, I have been using your implementation of [fairseq](https://github.com/EIFY/fairseq), and I had the following question: - In the [transformer_decoder](https://github.com/EIFY/fairseq/blob/main/fairseq/models/transformer/transformer_decoder.py), I see that the alibi bias is being added to...

Varun Gumma

Dataclass error while importing Fairseq in Python 3.11

Dataclass error while importing Fairseq in Python 3.11

Error restarting training and inability to enable checkpoint activations after compiling Fairseq model using torch.compile()

Error restarting training and inability to enable checkpoint activations after compiling Fairseq model using torch.compile()

Add support for Pfeiffer adapters to fairseq models

Modifying ALiBi for Encoder-Attention or Cross-Attention

Missing documentation on how to train a model

How to finetune embeddings and LM head as a single layer when they are tied?

Slower than absolute positional embeddings?

Length Extrapolatable Rotary Embeddings