MMuzzammil1

Results 6 comments of MMuzzammil1

Hi @0xDEADFED5. I created this issue for the "Phi-2" model (https://huggingface.co/microsoft/phi-2). Not sure of the behaviour of the Llama-3.

I'll run the benchmarks to check that. But @0xDEADFED5 isn't the decode speed at least independent of the prompt input?

I think this issue has been fixed in the release v0.6.2 of vllm now. Please see this: https://github.com/vllm-project/vllm/pull/8790.

> [@hongyanz](https://github.com/hongyanz) By the way, this is the accept length for Qwen3-8B-Eagle3 in code generation, and its TPS (tokens per second) can reach nearly 500. @jiahe7ay May I ask which...

> > > [@hongyanz](https://github.com/hongyanz) By the way, this is the accept length for Qwen3-8B-Eagle3 in code generation, and its TPS (tokens per second) can reach nearly 500. > > >...

@jiahe7ay do you have some results for temperature=1 for this draft model? Or you have mostly tested it for t=0?