Configurable Inbound/Outbound HTTP/1 Connection Idle Timeouts
Set LINKERD2_PROXY_INBOUND_HTTP1_CONNECTION_POOL_IDLE_TIMEOUT and LINKERD2_PROXY_OUTBOUND_HTTP1_CONNECTION_POOL_IDLE_TIMEOUT during injection
This MR introduces the ability to configure inbound and outbound HTTP/1 connection pool idle timeouts on a per-service. Currently, linkerd2-proxy has default idle timeouts of 3 seconds for both inbound and outbound HTTP/1 connections.
For example, this change is beneficial for services requiring persistent TCP connections across operational rounds.
There was the intention to do something similar, as commented some time ago here https://github.com/linkerd/linkerd2/issues/9273#issuecomment-1241116227:
Proposal
2.13.0Make this timeout configurable via the policy controller, controlled by an annotation set on a Server or Pod.
This MR aims to add the corresponding annotations in order to make it configurable via annotations in the pod.
Thanks @txabman42! Structurally, this change looks good but I want to make sure this won't conflict with future plans to make this configurable through the proxy API (as indicated by https://github.com/linkerd/linkerd2-proxy/blob/main/linkerd/app/src/env.rs#L286-L294). @olix0r do you have any thoughts on how near-term making this configurable through the API is?
@olix0r could you take an eye on this MR?
@mateiidavid could you review it again? TY in advance
@txabman42 Happy new year! Think you can resolve the conflict and take a look at the failing test here? 🙂
@kflynn Conflict resolved and tests fixed, and happy new year!!!
Hi, @txabman42! Thanks for your patience. It's taken us quite a bit of time to review this since we've been focused on upcoming releases. The change itself looks good, but I think we'd like to have a bit more time to think through the contract that we're introducing here. This includes the annotation names, the annotation key domain, and whether this should be configured globally, per-workload, or both.
In the mean time, since this clearly addresses a problem you're having, I wanted to suggest a stop gap solution. We've introduced a feature that allows us to use static configuration overrides (through environment variables). These would apply globally; you can use them to tweak the idle timeouts.
It's not documented but you can have a look at:
@txabman42 going to close this PR since it's been open for a long time. Please let me know if the workaround I mentioned above is good for you. If you want this re-open, let us know. :)