fairseq2 icon indicating copy to clipboard operation
fairseq2 copied to clipboard

[LayerSkip] Self-Speculative Decoding

Open mostafaelhoushi opened this issue 7 months ago • 0 comments

Describe the solution you would like: Implement self-speculative decoding as described in this paper where the earlier layers act as the draft stage and remaining layers act as the verification stage.

Describe the alternatives you have considered: There are different options to implement that:

  • Implement regular Speculative Decoding where the draft stage is a separate model, and then Self-Speculative Decoding could be implemented by providing a subset of the layers as the draft model (e.g., this implementation)
    • If we use this setup, we can add some flags to inform earlier layers if they are running the draft stage or verification stage
  • Directly implement Self-Speculative Decoding as done here

Additional Context:

mostafaelhoushi avatar Jul 08 '24 17:07 mostafaelhoushi