consul Configure envoy's idle

Feature Description

Provide a way or an example to set envoy's idle_timeout

Use Case(s)

I’m having a TCP keep-alive issue, idle connections get disconnected by the proxy after an hour, and I want to overwrite idle_timeout either at service definition or globally.

I saw some examples like the one below, but It’s not clear to me if this envoy_public_listener_json will overwrite the service name/port that was defined at the sidecar_service level?

{
    "service": {
      "name": "test",
      "connect": {
        "sidecar_service": {
          "port": "xxxx",
          "proxy": {
            "upstreams": [
              {
                "destination_name": "xxxxx",
                "local_bind_port": "xxxxx",
                "config": {
                    envoy_public_listener_json= <<EOL
                        {
                            "name": "test",
                            "address": {
                            "socket_address": {
                                "address": "0.0.0.0",
                                "port_value": "xxxx"
                            }
                            },
                            "filter_chains": [
                            {
                            "filters": [
                                {
                                    "name": "envoy.tcp_proxy",
                                    "config": {
                                        "idle_timeout": "2h"
                                    },                  
                            ]
                        ]
                        }
                    ]
                    }
                    EOL
                }
              }
            ]
          }
        }
      }
    }
}

Aug 16 '20 09:08 malhomaid

@mhomaid1 Thanks for using Consul Service Mesh. Though several Envoy features are configurable directly through Consul, there are other less common ones like the idle_timeout option you mention in this issue that is not exposed directly.

There are multiple types of listeners that Envoy exposes. The envoy_public_listener_json config option is to override the single public listener that accepts inbound connections. Each upstream you define is also associated with a corresponding Envoy listener. The configuration for those are overridden through a different [envoy_listener_json](https://www.consul.io/docs/connect/proxies/envoy#envoy_listener_json) option.

We recommend the following steps to use escape hatch functionality correctly:

Determine which listener you want to override using the existing escape_hatch mechanism. Given your example above, it looks like you are trying to override the idle_timeout for an upstream listener rather than the public listener for inbound requests.
Configure Consul to set up the listener without an escape hatch first
Copy the generated listener from the Envoy admin API (http://localhost:19000/config_dump). You should be able to identify the listener from the overall config dump, it will be prefixed by the name of the upstream service in the config.
Edit the json you copied to add the missing flag (idle_timeout in this case). Note that if you are updating the public listener, you'll need to remove TLS context and rbac/authz filters (this doesn't apply if you are updating the listener config for an upstream)
Drop that into the appropriate escape hatch override

We continue to add first-class configuration support for more commonly used Envoy features, but the above set of steps should help you override fields like idle_timeout.

Hope this helps.

Sep 29 '20 23:09 preetapan

@preetapan Thank you, I will give it a try.

Oct 02 '20 13:10 malhomaid

It's working for me, Thank you.

I'm facing keep-alive issues with Elasticsearch(It seems that they are changing keep-alive config behavior in master), RabbitMQ, and maybe any application that open long-lived connections without data moving all the time, I'm sure there is a reason behind Envoy's idle_timeout default value(1 hour) but it's causing a problem because usually, applications will respect kernel's tcp_keepalive_time parameter which defaults to (2 hours).

I tried to lower tcp_keepalive_time but it didn't help.

I prefer to fix the problem in Envoy/Consul rather than fixing it everywhere else because everything was working fine before.

Using escape hatch functionality works fine but It would be great if it's supported by Consul, If you are planning to add support for idle_timeout, I would like to contribute.

Oct 12 '20 08:10 malhomaid

I could use this too. Looks like the latest version of consul implements idle_timeout for HTTP requests but there is still no way to set the idle_timeout for TCP

Dec 26 '22 22:12 aoskotsky-amplify

I'm also facing connection dropping with all my TCP services through the mesh (postgres, redis). Most of the time, the client will re-establish the connection righ away, but it can cause transient issues (eg, health check failing, the app disappear from Traefik, or is restarted by Nomad when using check_restart for example)

Jan 27 '25 14:01 dani

Configure envoy's idle_timeout for TCP

Feature Description

Use Case(s)