sys_reading
sys_reading copied to clipboard
Erms: Efficient Resource Management for Shared Microservices with SLA Guarantees
https://dl.acm.org/doi/pdf/10.1145/3567955.3567964
https://www.youtube.com/watch?v=Vnex7mPM3S8
microservice resource management
- calls between microservices triggered a user's request form a graph
- e2e latency sensitive
- microservice are overprovisioning to ensure performance
challenges:
- difficult to do fine-grained scaling while satisfying SLA requirement. Microservice call graph is complicated, and e2e latency depends on all microservices within the graph.
- microservice multiplexing. 40% microservices are shared by >=100 online services. services have diverse workload patterns and different SLA requirements.
- resource interference is non-uniform. Performance imbalances between contains from the same microservices. A microservice usually has hundreds to thousands containers in cluster.
solution
-
step1: modeling microservices latency [Figure 3, sec2.2, sec5.2]
- Address Challenge 1,3
- microservices latency can be formulated with a piece-wise linear function(分段线性函数) . slope and cut-off point depend on workload and resource interference.
- For each microservice, Erms
- performs offline profiling,
- collect historical data including tail latency, workload(number of calls processed), CPU utilization, memory utilization
- apply ML model (decision tree) to learn parameters in the piece-wise linear function, determining the cut-off point (σ_i) between the two linear pieces
- This model captures how the microservice's latency changes with workload, and how this relationship is affected by resource utilization (CPU and memory) [sec5.2]
-
step2: latency target allocation [sec4.2, sec5.3.1]
- Address Challenge 1
- Latency targets in Erms represent the maximum time allocated to each microservice to process a request while still meeting the overall end-to-end latency SLA [sec2.2]
- It first simplifies the complex dependency graph into a graph with only sequential dependencies, represented by these virtual microservices.
- use DFS to simplify microservice graph. Merge microservices with sequential or parallel calls as virtual microservices [sec4.2, Algorithm1]
- It then computes latency targets for all microservices by solving a convex optimization problem that minimizes total resource usage while meeting SLA requirements.
- Objective Function: Minimize the total resource usage: min Σ(n_i * R_i) [equation 2]
- Constraint: Ensure the end-to-end latency meets the SLA: Σ(a_i * γ_i / n_i + b_i) ≤ SLA [equation 2,4]
- calculate initial latency target based on equation 5: Eq. (5) states that the optimal latency target of each microservice is in proportion to the square root of the product of ai , workload γi , and resource demand Ri.
- a_i represents the slope(斜率) with respect to workload in the latency function of microservice i. It quantifies how sensitive the microservice's latency is to changes in workload. A larger a_i means the microservice's latency increases more rapidly with increasing workload.
-
step3: Priority Scheduling and Resource Allocation [sec2.3, sec4.3, sec5.3.2, Figure 5, ]
- Address Challenge 2
-
- Priority Scheduling:
- 1.1. Priority Level Assignment: based on initial latency target computed in step2. For each shared microservice, Erms assigns priority levels to the services that use it. services with lower latency targets at a shared microservice are given higher priority.
- 1.2. Priority-based Scheduling: Whenever a thread is available in a deployed container and there are requests waiting to be processed, a request from the service with higher priority will be assigned to this thread with higher probability. [sec5.3.2]
- 1.3. Latency Target Recomputation: For each service, recomputes latency targets for all microservices in each service. This recomputation takes into account the priority assigned to each services at shared microservices. (eg: For a service with priority k, its new workload at a shared microservice is the sum of workloads from all services with priority 1 to k. )
-
- Resource Allocation: decide how many containers to allocate to each microservice, based on the initial latency targets and priority. Each microservice container is configured with 0.1 core and 200M memory. [sec4.3]
-
step4: Dynamic Resource Provisioning [sec5.4]
- decides where to place the allocated containers across physical hosts to minimize resource interference and balance utilization.
- For each host, the resource unbalance is defined as the difference between the host utilization and the cluster-wide utilization.
- use POP optimization to solve this problem. The main idea is to statically divides the hosts of a cluster into multiple groups of the same size and solves a much smaller scale optimization problem.