serving Group Service Acceleration

/area autoscale

Describe the feature

Knative recently lacks a way to handle group-service(i.e. A and B are different knative services, which form a complex function. The requesting path is: user-client requests ksvc A, then ksvc A requests ksvc B). As a result, cloud providers may encounter difficulties with very long response times(ksvc A cold start time + ksvc B cold start time). To alleviate this issue, this feature track proposes implementing a group-service-acceleration functionality that can scale up group-service at same time when they need to call each other. This will reduce the response time of requesting group-service. Also, the longer the requesting path, the more response time is reduced. See feature proposal in: https://docs.google.com/document/d/1bjjNecqqN0Bun7gP4oooqNOkpDPjqmbSGgCeJsT4hGQ

Jul 24 '22 17:07 jwcesign

@jwcesign: The label(s) kind/autoscale cannot be applied, because the repository doesn't have them.

In response to this:

/kind autoscale

Describe the feature

Knative recently lacks a way to handle group-service(i.e. A and B are different knative services, which form a complex function. The requesting path is: user-client requests ksvc A, then ksvc A requests ksvc B). As a result, cloud providers may encounter difficulties with very long response times(ksvc A cold start time + ksvc B cold start time). To alleviate this issue, this feature track proposes implementing a group-service-acceleration functionality that can scale up group-service at same time when they need to call each other. This will reduce the response time of requesting group-service. Also, the longer the requesting path, the more response time is reduced. See feature proposal in: https://docs.google.com/document/d/1bjjNecqqN0Bun7gP4oooqNOkpDPjqmbSGgCeJsT4hGQ/edit

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Jul 24 '22 17:07 knative-prow[bot]

cc @psschwei @mattmoor @dprotaso

Jul 24 '22 17:07 jwcesign

See also: https://docs.google.com/presentation/d/1bgoRwNCHqFxFF_JUFtQvtBYVXBmO4u9voWxeFkXISLo/edit

Jul 24 '22 20:07 mattmoor

@mattmoor looks the same core mind with traceId, but it's quite difficult to implement with traceid(from my first mind), the advantage is less configuration and user-friendly. Which implementation way do you recommend?

Jul 25 '22 01:07 jwcesign

Here is the test I did for the time cost with group services. https://github.com/jwcesign/group-service-acceleration

Aug 10 '22 17:08 jwcesign

Per today's WG discussion, next step is to reach out to the Flows WG / task force and see if there is any possible overlap with them.

FYI @rhuss @lionelvillard

Sep 07 '22 20:09 psschwei

there is no more flows task force due to the lack of participation.

Sep 07 '22 22:09 lionelvillard

Based on the meeting conclusion, I rewrite a new proposal to following these:

Define knative service dependency with a higher abstraction layer.
The implementation should be out of knative.
The implementation should be compatible with eventing workflow.

Here is the new feature proposal: https://docs.google.com/document/d/1CmVsbQv6oWtJj3oo-gku0g3fiZRHd_ngumPxne8ESpY/edit?usp=sharing

cc @nader-ziada @psschwei @lionelvillard @ricardozanini @rhuss @ro14nd

Sep 23 '22 14:09 jwcesign

At least on Kogito side, we can map the possible "Group Services" based on the workflow definition functions. This way we can warm up the target services.

It would be nice to have an API on Knative side to map these relationships so that maybe the platform could warm up ahead.

Sep 23 '22 18:09 ricardozanini

It would be nice to have an API on Knative side to map these relationships so that maybe the platform could warm up ahead.

I guess if install the abstraction layer crd ApplicationGraph, there is related API to map these relationship. So u mean install this crd by default to enable this api?

Sep 24 '22 01:09 jwcesign

Well, from the perspective of a workflow engine running on top of Knative, I'd say that we can create the ApplicationGraph based on a workflow definition, yes.

These workflow engines can consume a well-defined API for ApplicationGraph and implement this use case.

Sep 26 '22 12:09 ricardozanini

Simulation

For better check the response time, I did one simulation code:

package main

import (
	"fmt"
	"net/http"
	"sync"
	"time"
)

var ksvcLayer = 3

var scaleupTime = time.Second * 3
var handleTime = time.Millisecond * 100
var ksvcLayerCache = make([]chan int, 1000)
var ksvcLayerBackCache = make([][]chan int, 10000)
var handleConcurrency = 5
var startReplicas = 5
var scaleUpReplicas = 5

func launchOneReplicas(me, next chan int, layerIndex int) {
	for i := 0; i < handleConcurrency; i++ {
		go func() {
			for {
				select {
				case t := <-me:
					time.Sleep(handleTime)
					if layerIndex == ksvcLayer-1 {
						ksvcLayerBackCache[t][layerIndex] <- t
						continue
					}

					next <- t
					select {
					case <-ksvcLayerBackCache[t][layerIndex+1]:
						ksvcLayerBackCache[t][layerIndex] <- t
						continue
					}
				}
			}
		}()
	}
}

var once sync.Once
var scaleUpTrigger = make(chan struct{})

var serialIndex = 0
var mutex sync.Mutex

func serverHandler(w http.ResponseWriter, req *http.Request) {
	once.Do(func() {
		scaleUpTrigger <- struct{}{}
	})

	mutex.Lock()
	now := serialIndex
	serialIndex++
	mutex.Unlock()

	ksvcLayerBackCache[now] = make([]chan int, 1000)
	for i := 0; i < ksvcLayer; i++ {
		ksvcLayerBackCache[now][i] = make(chan int, 1000)
	}

	ksvcLayerCache[0] <- now
	select {
	case <-ksvcLayerBackCache[now][0]:
		w.Write([]byte("OK"))
	}
}

func main() {
	fmt.Println(time.Now())
	for i := 0; i < ksvcLayer+1; i++ {
		ksvcLayerCache[i] = make(chan int, 1000)
	}

	// Start init replicas
	for i := 0; i < ksvcLayer; i++ {
		for j := 0; j < startReplicas; j++ {
			go launchOneReplicas(ksvcLayerCache[i], ksvcLayerCache[i+1], i)
		}
	}

	go func() {
		// scale up when trigger
		<-scaleUpTrigger
		// scale up ksvc one by one
		time.Sleep(scaleupTime)
		fmt.Println("Scale ksvc layer:", 0)
		for j := 0; j < scaleUpReplicas; j++ {
			go launchOneReplicas(ksvcLayerCache[0], ksvcLayerCache[1], 0)
		}
		for i := 1; i < ksvcLayer; i++ {
			time.Sleep(scaleupTime) //Here enable or disable ApplicationGraph
			fmt.Println("Scale ksvc layer:", i)
			for j := 0; j < scaleUpReplicas; j++ {
				go launchOneReplicas(ksvcLayerCache[i], ksvcLayerCache[i+1], i)
			}
		}
	}()


	http.HandleFunc("/", serverHandler)
	http.ListenAndServe(":8090", nil)
}

The test result is in following situation:

//Simulation config
var ksvcLayer = 3 //means there is three layer ksvc, like a->b->c
var scaleupTime = time.Second * 3 // time cost to scale up
var handleTime = time.Millisecond * 100 // handle time for one requests
var handleConcurrency = 5 // user-container's concurrency config
var startReplicas = 5 // init replicas
var scaleUpReplicas = 5 // scale more replicas

Without ApplicationGraph:

$ ./hey -n 1000 -c 50 http://127.0.0.1:8090

Summary:
  Total:        8.7293 secs
  Slowest:      0.6527 secs
  Fastest:      0.3152 secs
  Average:      0.4311 secs
  Requests/sec: 114.5571

  Total data:   2000 bytes
  Size/request: 2 bytes

Response time histogram:
  0.315 [1]     |
  0.349 [424]   |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.383 [0]     |
  0.416 [0]     |
  0.450 [325]   |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.484 [0]     |
  0.518 [0]     |
  0.551 [25]    |■■
  0.585 [0]     |
  0.619 [0]     |
  0.653 [225]   |■■■■■■■■■■■■■■■■■■■■■

With ApplicationGraph:

$ ./hey -n 1000 -c 50 http://127.0.0.1:8090

Summary:
  Total:        7.9874 secs
  Slowest:      0.6417 secs
  Fastest:      0.3153 secs
  Average:      0.3947 secs
  Requests/sec: 125.1969

  Total data:   2000 bytes
  Size/request: 2 bytes

Response time histogram:
  0.315 [1]     |
  0.348 [749]   |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.381 [0]     |
  0.413 [0]     |
  0.446 [0]     |
  0.479 [25]    |■
  0.511 [0]     |
  0.544 [0]     |
  0.576 [0]     |
  0.609 [0]     |
  0.642 [225]   |■■■■■■■■■■■■

So from the results: With ApplicationGraph, time cost is shorter and the response time distribution is better.

Sep 27 '22 08:09 jwcesign

Do we understand why the histogram has the bumps it does?

Sep 28 '22 22:09 evankanderson

The details calculation is quite complex, this is the reason why I did one simulation program.

But the main reason should be:

Without ApplicationGraph, there is only following state

whole system concurrency 25(last 3s):
- timecost is three-layer-wait-time+three-layer-handle-time=0.6.
whole system other is concurrency 50.

With ApplicationGraph, there is following sate:

whole system concurrency is 25(last 9s):
- timecost is three-layer-wait-time+three-layer-handle-time=0.6
- timecost is 1st-layer-handle-time+two-layer-wait-timet+two-layer-handle-time=0.5
- timecost is two-layer-handle-time+one-layer-wait-timet+one-layer-handle-time=0.4
whole system concurrency 50. cc @evankanderson

Sep 29 '22 11:09 jwcesign

This issue is stale because it has been open for 90 days with no activity. It will automatically close after 30 more days of inactivity. Reopen the issue with /reopen. Mark the issue as fresh by adding the comment /remove-lifecycle stale.

Dec 29 '22 01:12 github-actions[bot]

serving serving copied to clipboard

Group Service Acceleration

Describe the feature

Describe the feature

Simulation

serving
serving copied to clipboard