JeffKatzy
JeffKatzy
@neubig I'm interested in this, can take a look at this, this weekend.
@neubig Ok, so I think the easiest approach would be to implement this with huggingface's assistant model feature. I found some references to this [with whisper](https://huggingface.co/blog/whisper-speculative-decoding), and another hf blog...
Ok, that helps. I was looking at the implementation portion, where they talk about speculative decoding in the blog post. > "We achieve speeds of >1000 tokens/s (just under 4000...
Ok, didn't realize the licensing issues. @neubig I can take another shot at this, and was exploring using the [Jedi pip package](https://github.com/davidhalter/jedi), which has an MIT license [here](https://github.com/davidhalter/jedi/blob/master/LICENSE.txt). Is that...