Mark McLoughlin

Results 50 comments of Mark McLoughlin

> the only request is to add plus 1 to the mean acceptance length since one token will always be accepted. so mean acceptance length is essentially "average number of...

> Hi @markmc, as far as I know, all speculative decoding literatures reporting acceptance length includes the bonus token since this quantity aligns with "number of tokens generated per forward...

I've pushed an update that I'm not super happy with To handle the case of DP where we have multiple sets of metrics identified by `engine_idx`, I've had to do...

> @markmc Is this PR waiting for review? Or is it in progress? It is waiting for review

> LGTM, @markmc could you just double check if the CI failure is related so that we can merge this PR? Yes, AFAICT all of these failures are happening on...

> @markmc Can you please merge from main again? Done. I don't think the rebase resolves any of the test failures, but I could be wrong

Ok, the docs failure was a genuine - but hard-to-spot - issue with the PR ``` vllm/docs/source/serving/engine_args.md:14: ERROR: Failed to import "_engine_args_parser" from "vllm.engine.arg_utils". No module named 'prometheus_client' ```

`start_container.sh` needs this too, perhaps with `:Z` for the model checkpoints dir since it would be shared between containers?

> `start_container.sh` needs this too, perhaps with `:Z` for the model checkpoints dir since it would be shared between containers? Also, `build_container.sh` - and maybe for that, since it is...

Couple of points: - For new metrics, the priority should be to add them in V1 since V0 will shortly be deprecated - Are `FINISHED_ABORTED` requests already counted under `request_success_total[length]`...