aibrix
aibrix copied to clipboard
[RFC]: Cache and Router refactoring for concurrent performance, concurrent safety and stateful routing.
Summary
Refactoring for cache:
- Merge multiple pod, model, and metric mapping by adding Pod metadata and Model metadata and using two main thread-safe registries for metadatas.
- Eliminate the global cache mutex lock and replace it with multiple layers of locks on in the metadata. Refactoring for router
- Eliminate thread-unsafe map access in router interface.
- Merge two contexts, context.Context and routing.RoutingContext, as RoutingContext
- Add queue router to enable per-model request reordering
- Abstract away router interface and RoutingContext for shared access from both routing and cache package.
Motivation
Concurrency safety concern for cache and routing interaction:
type Router interface {
// Route returns the target pod
Route(ctx context.Context, pods map[string]*v1.Pod, routingCtx RoutingContext) (string, error)
}
type Cache struct {
mu sync.RWMutex
...
ModelToPodMapping map[string]map[string]*v1.Pod // model_name: map[pod_name]*v1.Pod
...
}
As shown above, the router interface uses thread-unsafe map[string]*v1.Pod which is stored in another thread-unsafe ModelToPodMapping in cache object. On updating cache pods, golang can raise map concurrent access fault.
Concurrency performance concern for cache:
type Cache struct {
mu sync.RWMutex
...
metrics map[string]interface{}
ModelMetrics map[string]map[string]interface{}
Pods map[string]*v1.Pod
PodMetrics map[string]map[string]metrics.MetricValue // pod_name: map[metric_name]metric_val
PodModelMetrics map[string]map[string]map[string]metrics.MetricValue // pod_name: map[model_name]map[metric_name]metric_val
PodToModelMapping map[string]map[string]struct{} // pod_name: map[model_name]struct{}
ModelToPodMapping map[string]map[string]*v1.Pod // model_name: map[pod_name]*v1.Pod
...
pendingRequests *sync.Map // model_name: *int32
}
The current cache has eight maps to maintain the pod-model relationship and related metadata, such as metrics. More metadata might be added to the cache to support stateful routing. Redesign is imminent.
Router interface redesign:
- The multiple context object is redundant in the routing interface.
- The current routing policy supports FIFO routing only; we showcase a simple way to add pluggable request queue support that allows request reordering.
Proposed Change
As shown in the UML, we propose:
- Using Pod and Model to store all metadata previously maintained in eight maps.
- Cut eight thread-unsafe cache global maps to 2 sync.Map wrappers. (Ignoring ModelGPUProfile for now)
- Redefine router interface for: a. merge request context (context.Context) and routing context (RoutingContext) b. using array-like PodArray to replace pod map
- PodArray supports deployment-based heterogenous GPUs.
- Add new APIs to the routing context to support request reordering.
- Add a queue router to showcase a pluggable, stateful, per-model router.
Alternatives Considered
No response