speculative-decoding topic
intel-extension-for-transformers
⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡
BigLittleDecoder
[NeurIPS'23] Speculative Decoding with Big Little Decoder
SpecDec
Codes for our paper "Speculative Decoding: Exploiting Speculative Execution for Accelerating Seq2seq Generation" (EMNLP 2023 Findings)
aphrodite-engine
Large-scale LLM inference engine
Sequoia
scalable and robust tree-based speculative decoding algorithm
TriForce
[COLM 2024] TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
EAGLE
Official Implementation of EAGLE-1 (ICML'24) and EAGLE-2 (EMNLP'24)
speculative_decoding.c
minimal C implementation of speculative decoding based on llama2.c
REST
REST: Retrieval-Based Speculative Decoding, NAACL 2024
Speculative-Decoding
Implementation of the paper Fast Inference from Transformers via Speculative Decoding, Leviathan et al. 2023.