serving icon indicating copy to clipboard operation
serving copied to clipboard

What does the concurrency really mean?

Open sj8d3e6602 opened this issue 3 years ago • 5 comments

In what area(s)?

/area metrics

The question is about the metrics system. Let's assume that:

  • there is only one autoscaler and an activator which regularly (1s) reports its metrics to the autoscaler
  • the response time is infinite, which means the request will never end (timeout config is also infinite)

In my opinion, the concurrency is just the "current" request numbers which the activator (and the pods behind it) is dealing with. That means, if there is one request comes at the first second, and another comes at the second second... and so on, then the concurrency is 1 at the first second and 2 at the second second... Is that right?

However in knative, the activator will reset its data after it reports its metrics periodically, which means it doesn't know the request it received at the last report frame, so it will report 1 AverageConcurrency at the 1st second, and another 1 AverageConcurrency at the 2nd... and so on. Thus the autoscaler will think the concurrency is always equal to one by the whole StableWindow (60s) . But actually all the requests are being processed by the backend, **so the question is **, what is the right concurrency after the whole StableWindow, 60 or 1?

Thanks a lot.

sj8d3e6602 avatar Apr 25 '22 07:04 sj8d3e6602

A couple of points:

  • The Revision ContainerConcurrency field specifies the maximum number of requests the Container can handle at once. Container concurrency target percentage is how much of that maximum to use in a stable state. E.g. if a Revision specifies ContainerConcurrency of 10, then the Autoscaler will try to maintain 7 concurrent connections per pod on average.

  • Autoscaler calculates the required number of desired instances based on the average number of concurrent requests collected into 2-second buckets, then averaged over stable (60 s) and panic (6 s) windows. The desired instances are calculated based on the ratio between average concurrent requests per instance and target concurrent requests per instance.

So if you are asking about the Autoscaler deciding to scale up, that would happen based on the average number of requests coming in the 60s window

nader-ziada avatar Apr 28 '22 20:04 nader-ziada

@nader-ziada Thanks a lot.

Em... It looks like I was not understood well. What I really cares is not how knative reacts with the metrics it collects, but the way how it thinks them.

I just cannot understand what "concurrency" means in knative. It seems that the concurrency is not the number of the requests app serves but the acceleration them arrive?

sj8d3e6602 avatar May 07 '22 08:05 sj8d3e6602

Concurrency is the total number of requests arrivals either waiting in a queue or being processed. So if all the requests are still being processed the average should be 60.

nader-ziada avatar May 09 '22 14:05 nader-ziada

This issue is stale because it has been open for 90 days with no activity. It will automatically close after 30 more days of inactivity. Reopen the issue with /reopen. Mark the issue as fresh by adding the comment /remove-lifecycle stale.

github-actions[bot] avatar Aug 08 '22 01:08 github-actions[bot]

@sj8d3e6602 that's indeed an interesting question. I'm not an autoscaling expert, but I thought that it is the queue-proxy that reports the metric of requests-in-flight to the autoscaler, as it is only the queue proxy knows about this data, especially when the activator is bypassed when the serving pods have enough burst capacity (see https://docs.google.com/document/d/1ypS1Tyim-1zojrtFaG8i5AqXmmxhRYiV8X3CBzHvCSs/edit#bookmark=id.ghzzr9j07rul)

@dprotaso do you have any insights here ?

rhuss avatar Aug 08 '22 07:08 rhuss

This issue is stale because it has been open for 90 days with no activity. It will automatically close after 30 more days of inactivity. Reopen the issue with /reopen. Mark the issue as fresh by adding the comment /remove-lifecycle stale.

github-actions[bot] avatar Nov 08 '22 01:11 github-actions[bot]

This issue or pull request is stale because it has been open for 90 days with no activity.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close

/lifecycle stale

knative-prow-robot avatar Dec 08 '22 02:12 knative-prow-robot