United States of America Think deeper, decode faster
FasterDecoding
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
REST: Retrieval-Based Speculative Decoding, NAACL 2024