vllm icon indicating copy to clipboard operation
vllm copied to clipboard

[Speculative decoding] Support target-model logprobs

Open cadedaniel opened this issue 1 year ago • 2 comments

This PR allows vLLM to return correct log-probabilities of sampled tokens when speculative decoding is enabled. In addition, if the user specifies logprobs in their request, the correct top-k logprobs are returned.

The log-probabilities are expected to be equal to the log-probabilities when speculative decoding is not used.

Testing

  • See https://github.com/vllm-project/vllm/pull/4378/files#diff-2d36a32d508a5729c33b7ef42e285d9f382ca997c10b437b005b15390b0450cb

cadedaniel avatar Apr 25 '24 23:04 cadedaniel

btw, warning; there willl be a big sampler refactoring in this PR; https://github.com/vllm-project/vllm/pull/4309

rkooo567 avatar Apr 26 '24 09:04 rkooo567

thanks for heads up; I think I can keep it decoupled

cadedaniel avatar Apr 26 '24 13:04 cadedaniel

@cadedaniel can we get this merged today?

richardliaw avatar May 03 '24 17:05 richardliaw

@richardliaw yep

@Yard1 I benchmarked and there is room to optimize. I feel we should follow up once we have E2E spec decode numbers (the implementation is reasonably efficient)

cadedaniel avatar May 03 '24 18:05 cadedaniel