aibrix icon indicating copy to clipboard operation
aibrix copied to clipboard

[RFC]: Cache and Router refactoring for concurrent performance, concurrent safety and stateful routing.

Open zhangjyr opened this issue 9 months ago • 0 comments

Summary

Refactoring for cache:

  1. Merge multiple pod, model, and metric mapping by adding Pod metadata and Model metadata and using two main thread-safe registries for metadatas.
  2. Eliminate the global cache mutex lock and replace it with multiple layers of locks on in the metadata. Refactoring for router
  3. Eliminate thread-unsafe map access in router interface.
  4. Merge two contexts, context.Context and routing.RoutingContext, as RoutingContext
  5. Add queue router to enable per-model request reordering
  6. Abstract away router interface and RoutingContext for shared access from both routing and cache package.

Motivation

Concurrency safety concern for cache and routing interaction:

type Router interface {
	// Route returns the target pod
	Route(ctx context.Context, pods map[string]*v1.Pod, routingCtx RoutingContext) (string, error)
}
type Cache struct {
	mu                sync.RWMutex
        ...
	ModelToPodMapping map[string]map[string]*v1.Pod   // model_name: map[pod_name]*v1.Pod
        ...
}

As shown above, the router interface uses thread-unsafe map[string]*v1.Pod which is stored in another thread-unsafe ModelToPodMapping in cache object. On updating cache pods, golang can raise map concurrent access fault.

Concurrency performance concern for cache:

type Cache struct {
	mu                sync.RWMutex
	...
	metrics           map[string]interface{}
	ModelMetrics      map[string]map[string]interface{}
	Pods              map[string]*v1.Pod
	PodMetrics        map[string]map[string]metrics.MetricValue            // pod_name: map[metric_name]metric_val
	PodModelMetrics   map[string]map[string]map[string]metrics.MetricValue // pod_name: map[model_name]map[metric_name]metric_val
	PodToModelMapping map[string]map[string]struct{}                       // pod_name: map[model_name]struct{}
	ModelToPodMapping map[string]map[string]*v1.Pod                        // model_name: map[pod_name]*v1.Pod
        ...
	pendingRequests   *sync.Map                                            // model_name: *int32
}

The current cache has eight maps to maintain the pod-model relationship and related metadata, such as metrics. More metadata might be added to the cache to support stateful routing. Redesign is imminent.

Router interface redesign:

  1. The multiple context object is redundant in the routing interface.
  2. The current routing policy supports FIFO routing only; we showcase a simple way to add pluggable request queue support that allows request reordering.

Proposed Change

Image As shown in the UML, we propose:

  1. Using Pod and Model to store all metadata previously maintained in eight maps.
  2. Cut eight thread-unsafe cache global maps to 2 sync.Map wrappers. (Ignoring ModelGPUProfile for now)
  3. Redefine router interface for: a. merge request context (context.Context) and routing context (RoutingContext) b. using array-like PodArray to replace pod map
  4. PodArray supports deployment-based heterogenous GPUs.
  5. Add new APIs to the routing context to support request reordering.
  6. Add a queue router to showcase a pluggable, stateful, per-model router.

Alternatives Considered

No response

zhangjyr avatar Mar 14 '25 23:03 zhangjyr