[Speculative decoding] Support target-model logprobs
This PR allows vLLM to return correct log-probabilities of sampled tokens when speculative decoding is enabled. In addition, if the user specifies logprobs in their request, the correct top-k logprobs are returned.
The log-probabilities are expected to be equal to the log-probabilities when speculative decoding is not used.
Testing
- See https://github.com/vllm-project/vllm/pull/4378/files#diff-2d36a32d508a5729c33b7ef42e285d9f382ca997c10b437b005b15390b0450cb
btw, warning; there willl be a big sampler refactoring in this PR; https://github.com/vllm-project/vllm/pull/4309
thanks for heads up; I think I can keep it decoupled
@cadedaniel can we get this merged today?
@richardliaw yep
@Yard1 I benchmarked and there is room to optimize. I feel we should follow up once we have E2E spec decode numbers (the implementation is reasonably efficient)