fairseq2
fairseq2 copied to clipboard
[LayerSkip] Early Exit Loss
Describe the solution you would like:
- Enable the training script to access outputs of intermediate layers
- Modify loss function to incorprate outputs of earlier layers
Describe the alternatives you have considered: Different approaches to implement:
- store output of each layer in a dictionary variable of the model
- return outputs of intermediate layers as an additional return for the forward() function (the disadvantage of this is that it can introduce errors into existing training loops)
- Utilize existing hook mechanism
Additional Context: This is to enable implementing ideas from various papers such as: