serving icon indicating copy to clipboard operation
serving copied to clipboard

Fix timeout handlers to max duration instead of time to first byte

Open nader-ziada opened this issue 3 years ago • 24 comments

Fixes #12634

Proposed Changes

  • Use max duration instead of time to first byte in timeout handler
  • add RevisionRequestStartTimeout field to use a first byte timeout for a request
  • expose RevisionIdleTimeout

Release Note

Revision timeouts now has the following three values, TimeoutSeconds represents the max duration a request can take, ResponseStartTimeoutSeconds is the timeout allowed before a request starts responding and IdleTimeoutSeconds is the max duration a request can remain open without getting any data. 

/hold

nader-ziada avatar May 26 '22 20:05 nader-ziada

That's not a good idea in general — how are you going to deal with long lasting streaming requests? Is there an issue attached?

vagababov avatar May 26 '22 20:05 vagababov

Codecov Report

Merging #12970 (8c0f7bd) into main (5bba016) will decrease coverage by 0.08%. The diff coverage is 71.05%.

@@            Coverage Diff             @@
##             main   #12970      +/-   ##
==========================================
- Coverage   86.72%   86.64%   -0.09%     
==========================================
  Files         196      196              
  Lines       14463    14493      +30     
==========================================
+ Hits        12543    12557      +14     
- Misses       1624     1638      +14     
- Partials      296      298       +2     
Impacted Files Coverage Δ
pkg/apis/serving/v1/revision_types.go 100.00% <ø> (ø)
pkg/queue/sharedmain/main.go 0.64% <0.00%> (-0.01%) :arrow_down:
pkg/http/handler/timeout.go 83.44% <70.58%> (-6.36%) :arrow_down:
pkg/apis/config/defaults.go 84.48% <84.21%> (-1.24%) :arrow_down:
pkg/reconciler/revision/resources/queue.go 98.23% <100.00%> (+0.09%) :arrow_up:
pkg/reconciler/configuration/configuration.go 83.67% <0.00%> (-1.54%) :arrow_down:
pkg/autoscaler/scaling/multiscaler.go 87.24% <0.00%> (-1.35%) :arrow_down:
pkg/autoscaler/statforwarder/leases.go 73.95% <0.00%> (+1.56%) :arrow_up:

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

codecov[bot] avatar May 26 '22 20:05 codecov[bot]

That's not a good idea in general — how are you going to deal with long lasting streaming requests? Is there an issue attached?

I'm just starting to work on this based on this issue here: https://github.com/knative/serving/issues/12634#issuecomment-1137765904 , its not complete yet

nader-ziada avatar May 26 '22 20:05 nader-ziada

@vagababov would appreciate your feedback since you know this area of the code pretty well

nader-ziada avatar May 26 '22 20:05 nader-ziada

Mostly as above: you have no way of killing failed/deadlocked, but longer running tasks without TTFB. Reading the issue, it seems that there's some inconsistency with spec. But also removing this might break some backward compatibility :)

vagababov avatar May 26 '22 20:05 vagababov

long running streams (ie. websockets) is already broken

you have no way of killing failed/deadlocked, but longer running tasks without TTFB

I want to introduce IdleTimeout to handle the TTFB - we can introduce a sane default there - min(10s,Spec.TimeoutSeconds) or something

See the issue here: https://github.com/knative/serving/issues/12634

dprotaso avatar May 26 '22 20:05 dprotaso

We must go deeper! :) Idle timeout is certainly useful in some scenarios, and not for others — so your default is infinite or huge? e.g. I do on demand video conversion. All in all, all three might be necessary.

But yeah, my main concern would be not breaking existing users.

vagababov avatar May 26 '22 20:05 vagababov

I still think the idea of profiles we had (not quite sure if it went anywhere, I look only at things that catch my eye) — would provide canned sets of these settings, but oh well :)

vagababov avatar May 26 '22 20:05 vagababov

All in all, all three might be necessary.

Yeah that's fair

But yeah, my main concern would be not breaking existing users.

OSS broke portability and I think that's critical to fix. Providing alternative knobs (idle, ttfb, timeout) should give users enough flexibility.

dprotaso avatar May 26 '22 20:05 dprotaso

All in all, all three might be necessary.

Yeah that's fair

But yeah, my main concern would be not breaking existing users.

OSS broke portability and I think that's critical to fix. Providing alternative knobs (idle, ttfb, timeout) should give users enough flexibility.

how about we go back to keeping all three and allowing the user to choose the behaviour, for example enabling or disabling the ttfb?

nader-ziada avatar May 26 '22 21:05 nader-ziada

how about we go back to keeping all three and allowing the user to choose the behaviour, for example enabling or disabling the ttfb?

I would prefer three different attributes - vs. some extra annotation that tweaks the semantics of TimeoutSeconds

dprotaso avatar May 27 '22 13:05 dprotaso

how about we go back to keeping all three and allowing the user to choose the behaviour, for example enabling or disabling the ttfb?

I would prefer three different attributes - vs. some extra annotation that tweaks the semantics of TimeoutSeconds

so add MaxDuration back to the spec?

nader-ziada avatar May 27 '22 13:05 nader-ziada

so add MaxDuration back to the spec?

No - TimeoutSeconds is MaxDuration

dprotaso avatar May 27 '22 14:05 dprotaso

so add MaxDuration back to the spec?

No - TimeoutSeconds is MaxDuration

I'm confused, TimeoutSeconds is currently used by timeoutFirstByte, and we were going to remove that and use it for MaxDuration but now thinking we should not remove timeoutFirstByte after all.

Are you saying use TimeoutSeconds for max duration and add a new fields for timeoutFirstByte and idleTimeout?

nader-ziada avatar May 27 '22 16:05 nader-ziada

Are you saying use TimeoutSeconds for max duration and add a new fields for timeoutFirstByte and idleTimeout?

yup

TimeoutSeconds should behave as the spec originally intended which is max duration

dprotaso avatar May 27 '22 16:05 dprotaso

/retest

nader-ziada avatar May 28 '22 00:05 nader-ziada

/test upgrade-tests_serving_main

nader-ziada avatar May 30 '22 14:05 nader-ziada

The upgrade tests are failing because we are added a new field, related issue here https://github.com/knative/serving/issues/11980

nader-ziada avatar May 31 '22 15:05 nader-ziada

/test upgrade-tests_serving_main

nader-ziada avatar Jul 13 '22 19:07 nader-ziada

/retest

nader-ziada avatar Jul 19 '22 03:07 nader-ziada

/retest

nader-ziada avatar Jul 20 '22 19:07 nader-ziada

/retest

nader-ziada avatar Jul 20 '22 22:07 nader-ziada

@dprotaso ready for review

nader-ziada avatar Aug 02 '22 18:08 nader-ziada

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dprotaso, nader-ziada

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

knative-prow[bot] avatar Aug 11 '22 15:08 knative-prow[bot]

/hold cancel

nader-ziada avatar Aug 11 '22 15:08 nader-ziada