Jingyuan
Jingyuan
### 🐛 Describe the bug requestTrace currently only reports data when requests are not using steam. It should work in for both stream and non-stream vLLm requests. ### Steps to...
### Summary Having access to the GPU profile used by the GPU optimizer, we propose to add a new routing policy that utilizes performance profiles per input/output token pattern to...
## Pull Request Description Refactoring for cache: 1. Merge multiple pod, model, and metric mapping by adding Pod metadata and Model metadata and using two main thread-safe registries for metadata....
### Summary Refactoring for cache: 1. Merge multiple pod, model, and metric mapping by adding Pod metadata and Model metadata and using two main thread-safe registries for metadatas. 2. Eliminate...