vllm [Speculative decoding] Support target-model logprobs

This PR allows vLLM to return correct log-probabilities of sampled tokens when speculative decoding is enabled. In addition, if the user specifies logprobs in their request, the correct top-k logprobs are returned.

The log-probabilities are expected to be equal to the log-probabilities when speculative decoding is not used.

Testing

See https://github.com/vllm-project/vllm/pull/4378/files#diff-2d36a32d508a5729c33b7ef42e285d9f382ca997c10b437b005b15390b0450cb

Apr 25 '24 23:04 cadedaniel

btw, warning; there willl be a big sampler refactoring in this PR; https://github.com/vllm-project/vllm/pull/4309

Apr 26 '24 09:04 rkooo567

thanks for heads up; I think I can keep it decoupled

Apr 26 '24 13:04 cadedaniel

@cadedaniel can we get this merged today?

May 03 '24 17:05 richardliaw

@richardliaw yep

@Yard1 I benchmarked and there is room to optimize. I feel we should follow up once we have E2E spec decode numbers (the implementation is reasonably efficient)

May 03 '24 18:05 cadedaniel