serving icon indicating copy to clipboard operation
serving copied to clipboard

minScale treatment races reachability check

Open markusthoemmes opened this issue 4 years ago • 8 comments

In what area(s)?

/area autoscale

What version of Knative?

All recent at least.

Expected Behavior

autoscaling.knative.dev/minScale holds the respective scale of a revision after it becomes ready.

Actual Behavior

We sometimes scale below the mark only to then scale up again.

Example failing test: https://prow.knative.dev/view/gs/knative-prow/pr-logs/pull/knative_serving/11217/pull-knative-serving-istio-stable-no-mesh/1383957220156248064

Steps to Reproduce the Problem

This is a race with the labeler component, which labels all revisions targeted by a Route to be "Reachable". If a revision is unreachable, we ignore the minScale annotation, since nothing can hit the respective revision anyway. This allows more efficient resource usage for minScaled revisions.

Possible solutions

1. Make the Unreachable state more consistent with routing state

In the Revision reconciler, we translate a certain RoutingState (as given by the labeler) to a given Reachability state on the KPA spec (which then informs the minScale decision). We already have a pending state for Revisions that have not been reconciled by the labeler yet. That means, if we'd base strictly off of that RoutingState, this race should be fixed.

However: As it stands, it's possible that a revision is never touched by the labeler, for example if Configuration and Route are created independently. Such a Revision stays in pending forever and as such, would never scale down. That may or may not be fine.

2. Allow for the race

We can allow for the race and adjust the respective test to actually wait for route state to kick in before starting to assert the number of pods.


Personally I'd love to make 1 fly, to have a reliable and consistent RoutingState that can be used as a useful signal for things. We just need to be aware to not change semantics in an unintended way, as Garbage Collection is relying on this mechanism too. Arguably, the semantics we're after are actually quite close to those of Garbage Collection so that might be a good thing!

markusthoemmes avatar Apr 19 '21 09:04 markusthoemmes

However: As it stands, it's possible that a revision is never touched by the labeler, for example if Configuration and Route are created independently. Such a Revision stays in pending forever and as such, would never scale down. That may or may not be fine.

Is this accurate? The labeler walks the route's traffic block so I'd assume it wouldn't matter how the Route/Configuration is related.

dprotaso avatar Apr 19 '21 14:04 dprotaso

I guess I meant to point out, that it's possible for a revision to never be touched by the labeler. For example if you create a Route + Config and specify specific revisions in the Route. The same goes for a Service I guess.

markusthoemmes avatar Apr 19 '21 14:04 markusthoemmes

For example if you create a Route + Config and specify specific revisions in the Route.

Am I missing something I feel like that's not the case cause the list of revisions/configs to label is built from the Route's spec & status

https://github.com/knative/serving/blob/ae54012f0e01c5352b3168c3d12ab6da5dfe70e3/pkg/reconciler/labeler/metasync.go#L39-L41

dprotaso avatar Apr 19 '21 15:04 dprotaso

Right, so this list might miss Revisions, which then are in RouteState = Pending (which we would take as a signal to not scale down for minScale purposes) and they'd stay there forever.

If I'm misinterpreting that, fixing this issue is trivial :thinking:

markusthoemmes avatar Apr 19 '21 15:04 markusthoemmes

Just reproduced the behavior of the labeler I described: Running the MinScale test and adding a 30s sleep to the labeler caused it to never see the first revision and thus never change it's routing state.

markusthoemmes avatar Apr 20 '21 15:04 markusthoemmes

Is this is fixing process?

jwcesign avatar Aug 26 '21 15:08 jwcesign

@jwcesign it's not, nope!

markusthoemmes avatar Aug 26 '21 16:08 markusthoemmes

/triage accepted

dprotaso avatar Aug 02 '22 18:08 dprotaso