serving if revision is inactive, scale to zero instead of waiting for last pod retention

Fixes #13812

Proposed Changes

if revision is inactive, scale to zero instead of waiting for last pod retention

Release Note

Apr 25 '24 03:04 eddy-oum

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: eddy-oum Once this PR has been reviewed and has the lgtm label, please assign creydr for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

Apr 25 '24 03:04 knative-prow[bot]

Welcome @eddy-oum! It looks like this is your first PR to knative/serving 🎉

Apr 25 '24 03:04 knative-prow[bot]

Hi @eddy-oum. Thanks for your PR.

I'm waiting for a knative member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Apr 25 '24 03:04 knative-prow[bot]

/ok-to-test

Apr 25 '24 14:04 dprotaso

/hold

I'm going on vacation but will want to review this in detail when I'm back in 2 weeks. I made some code changes in this area of the code before and broke some stuff so I want to be cautious here.

cc @ReToCode @skonto for their reviews in the meantime

Apr 25 '24 14:04 dprotaso

Note with v1.14 coming out this week next release is in July so there's need to rush

Apr 25 '24 14:04 dprotaso

If I understand the discussion in https://github.com/knative/serving/issues/13812 correctly I think this is fine. Dave is now on PTO, as he has done the recent larger rework in that area, let's also wait for his opinion.

Apr 26 '24 08:04 ReToCode

Does this change respect the scale down delay or we are already after that period?

Apr 26 '24 13:04 skonto

Does this change respect the scale down delay or we are already after that period?

~I think it is the latter; scale down delays are considered here, before applying handleToScaleZero~

edit: seems like current code doesn't respect the scale down delay, thx.

Apr 27 '24 03:04 eddy-oum

~I am wondering if I should do the reachability check together with the last pod retention check, instead of returning early, so that unreachable revisions can go through other checks.~

edit: never mind the above approach doesn't work.

I think another approach (instead of returning early in handleScaleZero) would be to return 0 in lastPodRetention(pa *autoscalingv1alpha1.PodAutoscaler, cfg *autoscalerconfig.Config) if pa is unreachable.

func lastPodRetention(pa *autoscalingv1alpha1.PodAutoscaler, cfg *autoscalerconfig.Config) time.Duration {
	if pa.Spec.Reachability == autoscalingv1alpha1.ReachabilityUnreachable {
		return 0
	}
	d, ok := pa.ScaleToZeroPodRetention()
	if ok {
		return d
	}
	return cfg.ScaleToZeroPodRetentionPeriod
}

this approach lets the pa go through other checks, especially the scale to zero grace period.

	}, {
		label:         "scale to zero, if revision is unreachable do not wait for last pod retention",
		startReplicas: 1,
		scaleTo:       0,
		wantReplicas:  0,
		wantScaling:   true,
		paMutation: func(k *autoscalingv1alpha1.PodAutoscaler) {
			paMarkInactive(k, time.Now().Add(-gracePeriod))
			WithReachabilityUnreachable(k)
		},
		configMutator: func(c *config.Config) {
			c.Autoscaler.ScaleToZeroPodRetentionPeriod = 10 * gracePeriod
		},
	}, {
		label:         "revision is unreachable, but before deadline",
		startReplicas: 1,
		scaleTo:       0,
		wantReplicas:  0,
		wantScaling:   false,
		paMutation: func(k *autoscalingv1alpha1.PodAutoscaler) {
			paMarkInactive(k, time.Now().Add(-gracePeriod+time.Second))
			WithReachabilityUnreachable(k)
		},
		configMutator: func(c *config.Config) {
			c.Autoscaler.ScaleToZeroPodRetentionPeriod = 10 * gracePeriod
		},
		wantCBCount: 1,

current implementation will fail the second test, but returning 0 in lastPodRetention for unreachable revisions would pass both tests.

What are your thoughts @ReToCode @skonto? Thanks!

May 01 '24 02:05 eddy-oum

/retest

May 30 '24 04:05 eddy-oum

/hold cancel

May 30 '24 19:05 dprotaso

/retest

Jun 02 '24 07:06 eddy-oum

@eddy-oum: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
certmanager-integration-tests_serving_main	b4f6a8a8587e43788371f4fe1c9044037a577182	link	true	`/test certmanager-integration-tests`

Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Jun 02 '24 07:06 knative-prow[bot]

/lgtm /approve

Jun 02 '24 15:06 dprotaso

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dprotaso, eddy-oum

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [dprotaso]

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

Jun 02 '24 15:06 knative-prow[bot]

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 84.77%. Comparing base (b0dfed2) to head (fe7d6a1).

Additional details and impacted files

@@           Coverage Diff           @@
##             main   #15161   +/-   ##
=======================================
  Coverage   84.76%   84.77%           
=======================================
  Files         218      218           
  Lines       13478    13480    +2     
=======================================
+ Hits        11425    11428    +3     
+ Misses       1686     1685    -1     
  Partials      367      367

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Jun 02 '24 15:06 codecov[bot]

serving serving copied to clipboard

if revision is inactive, scale to zero instead of waiting for last pod retention

Proposed Changes

Codecov Report

serving
serving copied to clipboard