vllm icon indicating copy to clipboard operation
vllm copied to clipboard

add spec infer related into prometheus metrics.

Open leiwen83 opened this issue 1 year ago • 9 comments

And add a new boost_ratio metric used to directly show how much spec infer would help in saving decoding steps.

leiwen83 avatar May 03 '24 15:05 leiwen83

cc @cadedaniel

leiwen83 avatar May 03 '24 15:05 leiwen83

will take a look Monday. btw, how is this different from system efficiency metric? (boost ratio == num_spec_tokens+1 * system efficiency?)

cadedaniel avatar May 03 '24 18:05 cadedaniel

will take a look Monday. btw, how is this different from system efficiency metric? (boost ratio == num_spec_tokens+1 * system efficiency?)

+1

robertgshaw2-redhat avatar May 03 '24 19:05 robertgshaw2-redhat

Thanks for the contribution! It would be great to have these metrics flowing through prometheus!

robertgshaw2-redhat avatar May 03 '24 19:05 robertgshaw2-redhat

will take a look Monday. btw, how is this different from system efficiency metric? (boost ratio == num_spec_tokens+1 * system efficiency?)

the new boost_ratio would express more accurate expression at how much system is benefit from spec info, as there is case that spec info give no proposal, like no matching in ngram or seqlen+spec exceed over model length.

Furthermore, with the new dynamic spec coming https://github.com/vllm-project/vllm/issues/4565, the k would not be constant one, so that we may need accumulate actual token emitted comparing with the steps.

leiwen83 avatar May 04 '24 02:05 leiwen83

@cadedaniel @robertgshaw2-neuralmagic Any comment for the latest PR change? :)

leiwen83 avatar May 08 '24 14:05 leiwen83

asking @LiuXiaoxuanPKU if she has bandwidth to review the PR. the approach looks good to me, concerns are 1) we should make sure the top-level metrics make sense to users (not just to us as developers), 2) the naming of the metrics collection seems weird

cadedaniel avatar May 09 '24 21:05 cadedaniel

reviewed

cade + i discussing a path fwd

robertgshaw2-redhat avatar May 09 '24 21:05 robertgshaw2-redhat

Hi @robertgshaw2-neuralmagic @cadedaniel ,

How is going with the spec related metric, have we got the conclusion for how to make it happen? ;) The metric is critical to us as a direct feedback reflecting how well current spec sys is doing.

leiwen83 avatar May 17 '24 10:05 leiwen83

thanks & sorry this slipped. I might have time tomorrow to finish review. cc @LiuXiaoxuanPKU and @comaniac who might have bandwidth.

cadedaniel avatar May 23 '24 17:05 cadedaniel

@cadedaniel I submit a rebased PR, which keep the concat logic as before. num_spec is made to aggregate "k" number.

leiwen83 avatar Jun 07 '24 02:06 leiwen83