Jiaxin Shan comments

Results 742 comments of


                                            Jiaxin Shan

[Bug] Remove compulsory `include_usage` when `stream=true` in gateway

/assign @varungup90

[Bug] Remove compulsory `include_usage` when `stream=true` in gateway

For futures relies on usage statistics, can we add in the documentation to ask them enable it explicitly? heterogenous feature need it as well. By default, it should be clean

[Bug] Remove compulsory `include_usage` when `stream=true` in gateway

@varungup90 could you give more suggestions on the tpm check? Let's get @gau-nernst onboard.

[Question] How to access the vLLM-Vineyard integration code mentioned in Distributed KV Cache documentation?

Great to see you here @cheyang long time no see. Yeah, @gaocegege gave the code pointer in vLLM. 1. vLLM code will be refactor to adapt to v1 architecture and...

[Question] How to access the vLLM-Vineyard integration code mentioned in Distributed KV Cache documentation?

@nechamab1 we release latest version of kv cache support. Please check documentation for more details. Now, all the code are hosted in this repo https://github.com/vllm-project/aibrix/blob/main/python/aibrix_kvcache/integration/vllm/vllm_v0.8.5-aibrix-kvcache.patch this version improves the performance...

Update documentation for Quick Start and Base Model deployment

@xieus Looks like you rebase the code in a different way and all commits hash are re-generated. I think it's easier to cut a new PR on your side. WDYT?...

Update documentation for Quick Start and Base Model deployment

Seems there's no progress here. I will close this change at this moment. feel free to reopen and rebase the master once you come back on this story

Gpu optimizer write deployment replica suggestion and autoscaler go through the calculation again

@kr11 yes, the `gpu_optimizer` exposes an endpoint and the returned metrics is like `vllm:deployment_replicas =1`. My point was we used to transform a metric to deployment replicas. but now, it...

Gpu optimizer write deployment replica suggestion and autoscaler go through the calculation again

@zhangjyr @nwangfw what's the status of this issue?

[Umbrella] Add webhook for validation

We can discuss more details on the webhook usage. In the examples, we just use huggingface models for simplicity. However, in real world, most users has to fetch weights from...