LLMSpeculativeSampling
LLMSpeculativeSampling copied to clipboard

Published 20 hours ago •

feifeibear

→

Metadata

Fast inference from large lauguage models via speculative decoding

Reame
Issues

Results 3 LLMSpeculativeSampling issues

Sort by recently updated

output logits not match. question about decoding when draft model and target model is the same.

4

In my opinion, the generation should be the same when draft model and target model is the same and temparature is 0. But in this case, the output logits of...

66RING

documentation

这个项目和transformers中实现的speculative方法区别是什么

wiluen

Bloom's kvcache are both (bs, head, seq, head_dim) in new version of transformers

Hi, thanks for your awesome demo of speculative sample. Some of your code maybe outdated in new version of transformer. In the `KVCacheModel` class, Bloom model' k cache shape is...

shadow150519