text-generation-inference
text-generation-inference copied to clipboard
speculative decoding complete guide added
What does this PR do?
In this PR, I've provided a comprehensive guide on Speculative Decoding techniques aimed at enhancing inference speed. It includes a basic implementation of the core ideas from the paper Accelerating Large Language Model Decoding with Speculative Sampling, applied to GPT-2. Alongside the implementation, I've added detailed explanations of the underlying mathematical principles and an intuitive breakdown of why speculative decoding works.
This PR serves as a complementary extension to the speculation.md file in the TGI repository, offering further practical and theoretical insights into speculative sampling.