Ning issues

Results 5 issues of


                                            Ning

Improving benchmarking token counting accuracy

### 🐛 Describe the bug It seems that the python client side token count is not same as the VLLM token counting results. The difference is about 300 for input...

area/heterogeneous

Improving benchmarking scripts with real prompts in heterogenous GPU story

### 🐛 Describe the bug In our current gpu-benchmarking scripts, we always use the prompt "Hi Hi Hi ..." to test model performance. deepeseek-coder-7b model always returns a long enough...

kind/bug

area/heterogeneous

[RFC] Deliver stable, feasible, and smooth output for GPU Optimizer

### 🚀 Feature Description and Motivation Based on the experiments conducted so far, we have identified the following issues that need to be addressed to ensure the GPU optimizer fully...

priority/critical-urgent

kind/feature

area/heterogeneous

Adding scaling down to 0 case Gateway handling

### 🚀 Feature Description and Motivation The autoscaler should support scaling down to 0. When a new request arrives, we should have an activator component intercepts the request and initializes...

area/autoscaling

area/gateway

kind/feature

Support benchmarking script by using real application trace

## Pull Request Description This PR adding benchmarking support by using real user application prompt traces in the heterogenous gpu story. This is a general benchmarking methods working for all...