Jingyuan issues

Results 4 issues of


                                            Jingyuan

requestTrace in the cache of gateway plugin does not support stream request

### 🐛 Describe the bug requestTrace currently only reports data when requests are not using steam. It should work in for both stream and non-stream vLLm requests. ### Steps to...

[RFC]: Load-aware pattern-based routing policy with profile support

### Summary Having access to the GPU profile used by the GPU optimizer, we propose to add a new routing policy that utilizes performance profiles per input/output token pattern to...

kind/enhancement

priority/important-soon

area/heterogeneous

[Misc][API] Cache and Router refactoring for concurrent performance, concurrent safety and stateful routing.

## Pull Request Description Refactoring for cache: 1. Merge multiple pod, model, and metric mapping by adding Pod metadata and Model metadata and using two main thread-safe registries for metadata....

[RFC]: Cache and Router refactoring for concurrent performance, concurrent safety and stateful routing.

### Summary Refactoring for cache: 1. Merge multiple pod, model, and metric mapping by adding Pod metadata and Model metadata and using two main thread-safe registries for metadatas. 2. Eliminate...

area/gateway

priority/critical-urgent

area/heterogeneous