Jiaxin Shan
Jiaxin Shan
/assign @varungup90
For futures relies on usage statistics, can we add in the documentation to ask them enable it explicitly? heterogenous feature need it as well. By default, it should be clean
@varungup90 could you give more suggestions on the tpm check? Let's get @gau-nernst onboard.
Great to see you here @cheyang long time no see. Yeah, @gaocegege gave the code pointer in vLLM. 1. vLLM code will be refactor to adapt to v1 architecture and...
@nechamab1 we release latest version of kv cache support. Please check documentation for more details. Now, all the code are hosted in this repo https://github.com/vllm-project/aibrix/blob/main/python/aibrix_kvcache/integration/vllm/vllm_v0.8.5-aibrix-kvcache.patch this version improves the performance...
@xieus Looks like you rebase the code in a different way and all commits hash are re-generated. I think it's easier to cut a new PR on your side. WDYT?...
Seems there's no progress here. I will close this change at this moment. feel free to reopen and rebase the master once you come back on this story
@kr11 yes, the `gpu_optimizer` exposes an endpoint and the returned metrics is like `vllm:deployment_replicas =1`. My point was we used to transform a metric to deployment replicas. but now, it...
@zhangjyr @nwangfw what's the status of this issue?
We can discuss more details on the webhook usage. In the examples, we just use huggingface models for simplicity. However, in real world, most users has to fetch weights from...