LLMSpeculativeSampling icon indicating copy to clipboard operation
LLMSpeculativeSampling copied to clipboard

Fast inference from large lauguage models via speculative decoding

Results 3 LLMSpeculativeSampling issues
Sort by recently updated
recently updated
newest added

In my opinion, the generation should be the same when draft model and target model is the same and temparature is 0. But in this case, the output logits of...

documentation

Hi, thanks for your awesome demo of speculative sample. Some of your code maybe outdated in new version of transformer. In the `KVCacheModel` class, Bloom model' k cache shape is...