Ning
Ning
### 🐛 Describe the bug It seems that the python client side token count is not same as the VLLM token counting results. The difference is about 300 for input...
### 🐛 Describe the bug In our current gpu-benchmarking scripts, we always use the prompt "Hi Hi Hi ..." to test model performance. deepeseek-coder-7b model always returns a long enough...
### 🚀 Feature Description and Motivation Based on the experiments conducted so far, we have identified the following issues that need to be addressed to ensure the GPU optimizer fully...
### 🚀 Feature Description and Motivation The autoscaler should support scaling down to 0. When a new request arrives, we should have an activator component intercepts the request and initializes...
## Pull Request Description This PR adding benchmarking support by using real user application prompt traces in the heterogenous gpu story. This is a general benchmarking methods working for all...