serving
serving copied to clipboard
Group Service Acceleration
/area autoscale
Describe the feature
Knative recently lacks a way to handle group-service(i.e. A and B are different knative services, which form a complex function. The requesting path is: user-client requests ksvc A, then ksvc A requests ksvc B). As a result, cloud providers may encounter difficulties with very long response times(ksvc A cold start time + ksvc B cold start time). To alleviate this issue, this feature track proposes implementing a group-service-acceleration functionality that can scale up group-service at same time when they need to call each other. This will reduce the response time of requesting group-service. Also, the longer the requesting path, the more response time is reduced. See feature proposal in: https://docs.google.com/document/d/1bjjNecqqN0Bun7gP4oooqNOkpDPjqmbSGgCeJsT4hGQ
@jwcesign: The label(s) kind/autoscale
cannot be applied, because the repository doesn't have them.
In response to this:
/kind autoscale
Describe the feature
Knative recently lacks a way to handle group-service(i.e. A and B are different knative services, which form a complex function. The requesting path is: user-client requests ksvc A, then ksvc A requests ksvc B). As a result, cloud providers may encounter difficulties with very long response times(ksvc A cold start time + ksvc B cold start time). To alleviate this issue, this feature track proposes implementing a group-service-acceleration functionality that can scale up group-service at same time when they need to call each other. This will reduce the response time of requesting group-service. Also, the longer the requesting path, the more response time is reduced. See feature proposal in: https://docs.google.com/document/d/1bjjNecqqN0Bun7gP4oooqNOkpDPjqmbSGgCeJsT4hGQ/edit
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
cc @psschwei @mattmoor @dprotaso
See also: https://docs.google.com/presentation/d/1bgoRwNCHqFxFF_JUFtQvtBYVXBmO4u9voWxeFkXISLo/edit
@mattmoor looks the same core mind with traceId, but it's quite difficult to implement with traceid(from my first mind), the advantage is less configuration and user-friendly. Which implementation way do you recommend?
Here is the test I did for the time cost with group services. https://github.com/jwcesign/group-service-acceleration
Per today's WG discussion, next step is to reach out to the Flows WG / task force and see if there is any possible overlap with them.
FYI @rhuss @lionelvillard
there is no more flows task force due to the lack of participation.
Based on the meeting conclusion, I rewrite a new proposal to following these:
- Define knative service dependency with a higher abstraction layer.
- The implementation should be out of knative.
- The implementation should be compatible with eventing workflow.
Here is the new feature proposal: https://docs.google.com/document/d/1CmVsbQv6oWtJj3oo-gku0g3fiZRHd_ngumPxne8ESpY/edit?usp=sharing
cc @nader-ziada @psschwei @lionelvillard @ricardozanini @rhuss @ro14nd
At least on Kogito side, we can map the possible "Group Services" based on the workflow definition functions. This way we can warm up the target services.
It would be nice to have an API on Knative side to map these relationships so that maybe the platform could warm up ahead.
It would be nice to have an API on Knative side to map these relationships so that maybe the platform could warm up ahead.
I guess if install the abstraction layer crd ApplicationGraph
, there is related API to map these relationship. So u mean install this crd by default to enable this api?
Well, from the perspective of a workflow engine running on top of Knative, I'd say that we can create the ApplicationGraph
based on a workflow definition, yes.
These workflow engines can consume a well-defined API for ApplicationGraph
and implement this use case.
Simulation
For better check the response time, I did one simulation code:
package main
import (
"fmt"
"net/http"
"sync"
"time"
)
var ksvcLayer = 3
var scaleupTime = time.Second * 3
var handleTime = time.Millisecond * 100
var ksvcLayerCache = make([]chan int, 1000)
var ksvcLayerBackCache = make([][]chan int, 10000)
var handleConcurrency = 5
var startReplicas = 5
var scaleUpReplicas = 5
func launchOneReplicas(me, next chan int, layerIndex int) {
for i := 0; i < handleConcurrency; i++ {
go func() {
for {
select {
case t := <-me:
time.Sleep(handleTime)
if layerIndex == ksvcLayer-1 {
ksvcLayerBackCache[t][layerIndex] <- t
continue
}
next <- t
select {
case <-ksvcLayerBackCache[t][layerIndex+1]:
ksvcLayerBackCache[t][layerIndex] <- t
continue
}
}
}
}()
}
}
var once sync.Once
var scaleUpTrigger = make(chan struct{})
var serialIndex = 0
var mutex sync.Mutex
func serverHandler(w http.ResponseWriter, req *http.Request) {
once.Do(func() {
scaleUpTrigger <- struct{}{}
})
mutex.Lock()
now := serialIndex
serialIndex++
mutex.Unlock()
ksvcLayerBackCache[now] = make([]chan int, 1000)
for i := 0; i < ksvcLayer; i++ {
ksvcLayerBackCache[now][i] = make(chan int, 1000)
}
ksvcLayerCache[0] <- now
select {
case <-ksvcLayerBackCache[now][0]:
w.Write([]byte("OK"))
}
}
func main() {
fmt.Println(time.Now())
for i := 0; i < ksvcLayer+1; i++ {
ksvcLayerCache[i] = make(chan int, 1000)
}
// Start init replicas
for i := 0; i < ksvcLayer; i++ {
for j := 0; j < startReplicas; j++ {
go launchOneReplicas(ksvcLayerCache[i], ksvcLayerCache[i+1], i)
}
}
go func() {
// scale up when trigger
<-scaleUpTrigger
// scale up ksvc one by one
time.Sleep(scaleupTime)
fmt.Println("Scale ksvc layer:", 0)
for j := 0; j < scaleUpReplicas; j++ {
go launchOneReplicas(ksvcLayerCache[0], ksvcLayerCache[1], 0)
}
for i := 1; i < ksvcLayer; i++ {
time.Sleep(scaleupTime) //Here enable or disable ApplicationGraph
fmt.Println("Scale ksvc layer:", i)
for j := 0; j < scaleUpReplicas; j++ {
go launchOneReplicas(ksvcLayerCache[i], ksvcLayerCache[i+1], i)
}
}
}()
http.HandleFunc("/", serverHandler)
http.ListenAndServe(":8090", nil)
}
The test result is in following situation:
//Simulation config
var ksvcLayer = 3 //means there is three layer ksvc, like a->b->c
var scaleupTime = time.Second * 3 // time cost to scale up
var handleTime = time.Millisecond * 100 // handle time for one requests
var handleConcurrency = 5 // user-container's concurrency config
var startReplicas = 5 // init replicas
var scaleUpReplicas = 5 // scale more replicas
Without ApplicationGraph:
$ ./hey -n 1000 -c 50 http://127.0.0.1:8090
Summary:
Total: 8.7293 secs
Slowest: 0.6527 secs
Fastest: 0.3152 secs
Average: 0.4311 secs
Requests/sec: 114.5571
Total data: 2000 bytes
Size/request: 2 bytes
Response time histogram:
0.315 [1] |
0.349 [424] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.383 [0] |
0.416 [0] |
0.450 [325] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.484 [0] |
0.518 [0] |
0.551 [25] |■■
0.585 [0] |
0.619 [0] |
0.653 [225] |■■■■■■■■■■■■■■■■■■■■■
With ApplicationGraph:
$ ./hey -n 1000 -c 50 http://127.0.0.1:8090
Summary:
Total: 7.9874 secs
Slowest: 0.6417 secs
Fastest: 0.3153 secs
Average: 0.3947 secs
Requests/sec: 125.1969
Total data: 2000 bytes
Size/request: 2 bytes
Response time histogram:
0.315 [1] |
0.348 [749] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.381 [0] |
0.413 [0] |
0.446 [0] |
0.479 [25] |■
0.511 [0] |
0.544 [0] |
0.576 [0] |
0.609 [0] |
0.642 [225] |■■■■■■■■■■■■
So from the results: With ApplicationGraph, time cost is shorter and the response time distribution is better.
Do we understand why the histogram has the bumps it does?
The details calculation is quite complex, this is the reason why I did one simulation program.
But the main reason should be:
Without ApplicationGraph
, there is only following state
- whole system concurrency 25(last 3s):
- timecost is three-layer-wait-time+three-layer-handle-time=0.6.
- whole system other is concurrency 50.
With ApplicationGraph
, there is following sate:
- whole system concurrency is 25(last 9s):
- timecost is three-layer-wait-time+three-layer-handle-time=0.6
- timecost is 1st-layer-handle-time+two-layer-wait-timet+two-layer-handle-time=0.5
- timecost is two-layer-handle-time+one-layer-wait-timet+one-layer-handle-time=0.4
- whole system concurrency 50. cc @evankanderson
This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen
. Mark the issue as
fresh by adding the comment /remove-lifecycle stale
.