serving targetBurstCapacity does not behave correctly when there is more than one activator pod

Expected Behavior

See this post for background where high-availability.replicas was set to 2.

When targetBurstCapacity is set to '-1' the activator should hold on to requests until a pod becomes available to process it. This means that if only 1 pod is available (with containerConcurrency: 1) and 2 long running requests are made, the first request should be processed by the first pod, and a second pod should be scaled to service the second request.

Actual Behavior

When 2 long running requests are made, both requests seem to be sent to the first pod. A second pod is created, but never used.

When high-availability.replicas was reduced to 1 so that only 1 activator was present, expected behavior was observed and targetBurstCapaciry seems to work correctly.

Steps to Reproduce the Problem

A cluster with 2 activator pods and 1 running worker pod
Set targetBurstCapacity: '-1' , containerConcurrency: 1 and minScale: 1
Send 2 long running requests (web socket connections in this case)
Observe both requests are sent to the same worker, newly scaled pods are not used

Additional Info

Running on OCP Serving version v0.19.0 Kourier gateway

Jun 15 '21 16:06 SeanEmac

Thanks for opening this!

We should:

Check if our subsetting is behaving correctly (I assume it is)
Discuss if we want to allow for a minActivators of 1 if only 1 replica is deployed for the backed revision.

This is sort of the edgiest edge case for our "perfect routing" and I think we kinda knew it'd be there. I'm not sure if we can fix it 100% given all the eventually-consistent behavior but mayyyybe the second point above would help.

Jun 16 '21 06:06 markusthoemmes

To clarify additionally: targetBurstCapacity behaves correct here. It's only making sure that the activators are in path. What's not behaving as expected is the activator's routing and loadbalancing behavior.

Jun 16 '21 06:06 markusthoemmes

TBC is not an issue here at all — TBC in fact works fine — requests go through the activator.
Generally, this is a known issue, since when choosing 2 as the minimum number of activators for a revision we were more concerned about no activators remaining if the one in question is restarting/crashing/pod is gone, etc, than this edgecase.

Jun 16 '21 06:06 vagababov

Some context from a slack thread we had about this:

One solution, that we and OpenWhisk considered at one point, to this edge case is when minActivators/pods means we can't properly subset, we could have the activators forward the request to the 'correct' activator. This would maintain availability, but keep all requests (when possible) on the activator with correct knowledge, avoiding this edge case. The extra hop seems likely to be more than made up for by the better load balancing in cases like this.

Jun 16 '21 09:06 julz

/triage accepted /help

Jun 25 '21 10:06 julz

@julz: This request has been marked as needing help from a contributor.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed by commenting with the /remove-help command.

In response to this:

/triage accepted /help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Jun 25 '21 10:06 knative-prow-robot

Hey team, I'm interested in fixing this, as suggested earlier in the thread, I went to check the subsetting logic and I think there is an issue around https://github.com/knative/serving/blob/44f60331bbfd21eb6a2bb324812f401aabb2d3d4/pkg/activator/net/throttler.go#L371-L376 This means, when there is one pod and multiple activators, all activators will think they get the pod assigned to themselves. My guess of what happened is that the two HTTP requests went to two different activators, they all thought the same pod had one slot available and routed the traffic there. Does this make sense?

Of course there might still be issues even when subsetting logic is correct due to eventual consistency, but fixing this should improve the situation.

/assign

Aug 24 '21 03:08 nealhu

@nealhu i think this is correct, but also (iiuc) intentional. When there is only one pod we don't want to subset it to a single activator because then if that activator happens to crash/get restarted we have an outage (rather than the request hitting another activator and succeeding). This is why we were thinking another solution might be that if a request hits the wrong activator it could forward it to the right activator (if it's up), and otherwise make the request itself.

Aug 24 '21 07:08 julz

Thanks @julz , just to expand a bit on what you said to make sure I understand, say there are 2 activators (A and B) and 1 pod:

If we don't subset (current behavior)

Both A and B can send traffic to the pod
If a client has a connection to the pod via A and A crashes, the client will reconnect to B and connect to the pod via B immediately

If we subset

Say A can send traffic to the pod, B won't send traffic to anybody
If a client has a connection to the pod via A and A crashes, the client will reconnect to B, B will hold the traffic until it knows that A was down and takes over the pod https://github.com/knative/serving/blob/44f60331bbfd21eb6a2bb324812f401aabb2d3d4/pkg/activator/net/throttler.go#L217-L237
comparing to the current behavior, when A is down, the client will experience additional delay but maybe not an error?

The proposed solution is for B to

Be aware of that A is handling the traffic for the pod, e.g. via subsetting logic or leader election
Be aware of other activators' addresses
When it receives traffic for the pod, it first tries to forward the traffic to where it belongs (A), if that fails (e.g. A is down), handles the traffic by itself

Am I correct?

Aug 25 '21 02:08 nealhu

I think that's mostly correct, but I'm thinking less about the case where an activator crashes mid-request and more with activator crashes in general. Right now we always assign at least 2 activators to a revision because then it 1 crashes there's still another one available to accept requests. The downside of that is we need to subset and so we get non-ideal load balancing (because both activators are balancing to only 1 pod, so they interfere with each other). We could solve this by only assigning 1 activator to the revision, but then if that crashed the revision would have no activators at all and would be completely unroutable for all requests until we were able to spin up a new activator and add it to the endpoints. The proposed solution is to keep 2 activators in this case (so if 1 crashes we still have at least 1 activator fronting the service), but have the 2nd forward requests to the first when it's up so only 1 activator is actually proxying to pods, in the normal case.

Sep 08 '21 07:09 julz

/unassign @nealhu

Aug 02 '22 18:08 dprotaso