Zijing Liu

Results 6 comments of Zijing Liu

Hi, just wanna check in to see if we have plan to support per request level stats logging? For example: {request_1: {ttit: 10}, {e2e_latency: 200}}.

Please ref to this ticket here: [#16121](https://github.com/vllm-project/vllm/issues/16121). bitsandbytes doesn't support `FusedMoE` now.

Hi @sdavidbd, thanks for the PR. Something additionally, for KV connectors (NIXLConnector) who use async API (e.g. `get_finished`), there are cases that KV fetching from remote keeps failing, and in...

> First, just to clarify: this PR focuses on adding the infrastructure for automatic recovery on the _decoder side_ in the event of KV load failures. Specifically, when a connector...

Get it @sdavidbd. Just to double confirm: in the case of fully async, when `scheduler` handles a failed request (whose kv loading has failed), it would reschedule such request to...