LLMSpeculativeSampling
LLMSpeculativeSampling copied to clipboard
Fast inference from large lauguage models via speculative decoding
Results
3
LLMSpeculativeSampling issues
Sort by
recently updated
recently updated
newest added
In my opinion, the generation should be the same when draft model and target model is the same and temparature is 0. But in this case, the output logits of...
documentation
Hi, thanks for your awesome demo of speculative sample. Some of your code maybe outdated in new version of transformer. In the `KVCacheModel` class, Bloom model' k cache shape is...