Pods created due to initialScale doesn't follow scaleDownDelay behavior
In what area(s)?
/area autoscale
What version of Knative?
0.19.0
Expected Behavior
Pods created due to initialScale should follow scaleDownDelay behavior.
Actual Behavior
Pods created due to initialScale doesn't follow scaleDownDelay behavior.
Steps to Reproduce the Problem
Create a revision with the following configuration:
autoscaling.knative.dev/initialScale: "3"
autoscaling.knative.dev/scaleDownDelay: "10m"
Check that three pods are created, but two of them are terminated right after the revision is ready, which will be likely too soon for a 10m scaleDownDelay.
This is the result of the problem that initialScale is a reconciler artifact and scaleDownDelay is an autoscaler artifact.
So the history in the scaledown window never even records 3, so it scales down to 1 immediately, given the lack of traffic.
This is the result of the problem that
initialScaleis a reconciler artifact andscaleDownDelayis an autoscaler artifact.So the history in the scaledown window never even records
3, so it scales down to1immediately, given the lack of traffic.
I see, a redesign of this implementation is something that is already in knative's roadmap?
If I understood correctly, the usefulness of initialScale today depends on the proper amount of traffic being routed to the new revision before KPA decides to control the new revision's number of replicas, and this can depend on "luck".
In a blue-green like deployment, I need that a new revision will only receive traffic if has enough instances to handle the load that the previous one is receiving. There could be an parameter for that, from an usability perspective.
Without the proper initialScale behavior, I will also need to change minScale a couple of times to try to guarantee a smooth deploy.
There are few things here:
- if you have initial scale and you send traffic warranting 3 pods — it should not scale down. At least for some time.
- That should not be "luck". If you're not routing enough traffic, you should not have 3 pods, but in general
usefulness of initialScaleis in general borderline. Its main initial purpose was actually to permit0. Positive values were added as a long fought political compromise. - you should look at the newest feature — gradual rollout, which is strictly time bound
latestrevision rollout mechanism, which is to be released in 0.20. Initially all the rollouts will use same config map value, but in future API changes will be done to support per config rollout duration. This is supposed to solve exactly those issues, though it requires you to do the computations.
-
and 2. Suppose that a new revision is created with initialScale 3, then KPA decides to downscale to 1, and only then the traffic is routed to the new revision in a blue-green like deployment. Here I supposed that these things can occur in that order, but correct me if I am wrong. In that case the traffic will observe a "degraded" (at least with larger latencies) service, until KPA scales again to 3.
-
A more out of the box gradual rollout would be great! Is there a documentation about it? When 0.20 will be released?
0.20 will be in January. No docs yet, but if you install from head, I can tell you how to turn this on.
I'd love beta testers :)
It would be great to have access to the gradual rollout feature. Unfortunately I need a stable solution sooner.
I've tried to edit a revision's autoscaling.knative.dev/minScale, however the change was not propagated to the related KPA.
Is that an intended behavior?
I know that revisions were supposed to be immutable, however I was expecting that this change would carry on to the related KPA, since this property is placed in an annotation, and not in the spec.
I could directly change the KPA's autoscaling.knative.dev/minScale annotation, however I don't know if this is safe.
My main concern is that a revision's controller can overwrite this change sometime later.
That can occur?
No you can't change revision at all. They are immutable. You need to change the parent service/config. If you change KPA then it will be too reconciled away.
OK.
Wouldn't be interesting to block changes in autoscaling.knative.dev/minScale annotation in the revision's admission webhook?
It seems that this kind of property would be better placed in the revision's spec anyway, or am I missing something?
🤷 we had immutable checks before but they were more hassle than worth.
I have to admit, when I originally came up with the ScaleDownDelay idea I think I had in mind that it should work nicely with initialScale so I personally would be quite up for fixing this. We could move the windowing in to the reconcile and out of the autoscaler, in theory, so that it occurs after - and properly considers - initialScale 🤔. Wdyt @vagababov @markusthoemmes ?
It feels natural in there, though... need to think more :)
I'd suggest doing initialscale in autoscaler but we only just got those worms back in the can 😅
It was super hard, that why we did it in the KPA. It is probably easier to add windowing to the KPA... especially that it'll automatically take into account minscale and what not.
yah exactly, so that means if we want initialScale to be windowed (which I think we do..) we either have to re-open that can of worms (no thanks ;)) or move the window to where initialScale is, I think, even if it's slightly unnatural. If we agree it's the right behaviour, though.....
The problem there is that reconciler has no state and I am not sure how that we want to introduce that there.
/assign @vagababov @julz
I am going to assign y'all to drive some sort of resolution on this to get it off the triage backlog 😇
sounds good, added to the future items sticky in the autoscaler wg agenda to talk through
/area autoscaling /triage accepted
@evankanderson: The label(s) area/autoscaling cannot be applied, because the repository doesn't have them.
In response to this:
/area autoscaling /triage accepted
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/area autoscale
/unassign
This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen. Mark the issue as
fresh by adding the comment /remove-lifecycle stale.
/remove-lifecycle stale
/lifecycle frozen
Someone got any workarounds for this? besides patching "minReplicas" to existing Service objects?