speculative-decoding
speculative-decoding copied to clipboard
About batch size > 1
First of all, thank you for open-sourcing the implementation of speculative decoding at batch size > 1. I would like to ask if it is possible to adapt directly to the models downloaded by huggingface instead of customizing their framework code. Because I tried to use this with codegen, but the generated content is messy. Hope you can answer my confusion at your convenience.