contour
contour copied to clipboard
Websocket Timeout
Websocket connections seem to be timing-out every 16s:
Expected behavior is that the websocket connection maintains Pending status indefinitely and does not timeout:
It seems this issue occurs after upgrading to Contour 1.15.0
- Contour version: 1.15.0
- Kubernetes version: (use
kubectl version
): 1.20.4 - Kubernetes installer & version: kubeadm
- Cloud provider or hardware configuration: AWS
- OS (e.g. from
/etc/os-release
): CentOS 8
May be related to this: https://github.com/envoyproxy/envoy/issues/16129
Hmm, interestingly, I can't reproduce when using the following HTTPProxy:
apiVersion: projectcontour.io/v1
kind: HTTPProxy
metadata:
name: wstest
namespace: wstest
annotations:
kubernetes.io/ingress.class: contour
spec:
virtualhost:
fqdn: wstest.youngnick.dev
tls:
secretName: wstestcert
routes:
- conditions:
- prefix: /
enableWebsockets: true
services:
- name: wstest
port: 8010
This was with Contour 1.15.0, and Envoy 1.18.3, as per our example YAML.
If this is related to envoyproxy/envoy#16129, this would make sense, as it needs to be on a sub-path. I'll try that next.
Okay, I changed the HTTPProxy as follows:
apiVersion: projectcontour.io/v1
kind: HTTPProxy
metadata:
name: wstest
namespace: wstest
annotations:
kubernetes.io/ingress.class: contour
spec:
virtualhost:
fqdn: wstest.youngnick.dev
tls:
secretName: wstestcert
routes:
- conditions:
- prefix: /
enableWebsockets: true
services:
- name: echoserver
port: 80
- conditions:
- prefix: /websocket
enableWebsockets: true
services:
- name: wstest
port: 8010
And I still can't get this to reproduce. I also tried with the TLS config removed, same deal.
@ccravens, could you try your client against this setup? I'll leave wss://wstest.youngnick.dev/websocket
running overnight my time so you should have a chance to have a go. I think it's very unlikely, but then that rules out any issues between your client and your server. Then we can try and figure out what's different between your HTTPProxy and mine. Any chance you could post it here (or message it to me on Slack or something?)
Thanks @youngnick! What I'm seeing on my end is there are essentially 2 services using 2 different HTTPProxy on 2 different pods in 2 different namespaces that are experiencing this exact behavior described in the issue. Service names are 1) app and 2) IDE. Please find below the specifics of the configurations for each.
Screenshot of the 2 services both experiencing a 16s reconnect (ws
is app, and services
is IDE):
Service 1 - App
apiVersion: projectcontour.io/v1
kind: HTTPProxy
metadata:
name: app
namespace: app
spec:
routes:
- conditions:
- prefix: /
services:
- name: app
port: 80
- conditions:
- prefix: /api
services:
- name: app
port: 3000
- conditions:
- prefix: /ws
enableWebsockets: true
services:
- name: app
port: 3000
virtualhost:
fqdn: app.example.com
tls:
secretName: app-certificate
apiVersion: v1
kind: Service
metadata:
name: app
namespace: app
spec:
ports:
- name: frontend
port: 80
protocol: TCP
targetPort: 8080
- name: api
port: 3000
protocol: TCP
targetPort: 3000
selector:
app: app
type: ClusterIP
Service 2 - IDE
apiVersion: projectcontour.io/v1
kind: HTTPProxy
metadata:
name: ide
namespace: ide
spec:
routes:
- conditions:
- prefix: /
enableWebsockets: true
services:
- name: ide
port: 8080
virtualhost:
fqdn: ide.example.com
tls:
secretName: ide-wildcard-certificate
apiVersion: v1
kind: Service
metadata:
name: ide
namespace: ide
spec:
ports:
- name: http
port: 8080
protocol: TCP
targetPort: 8080
selector:
app: ide
sessionAffinity: None
type: ClusterIP
in office hours we were able to find this Envoy commit: https://github.com/envoyproxy/envoy/pull/15585
if there is no max_stream_duration
set, the route response timeout from the upstream is used, which defaults to 15s
we had @ccravens deploy Envoy 1.17 and saw the reconnects go away, with 1.18 we were able to change the https://projectcontour.io/docs/v1.15.1/config/api/#projectcontour.io/v1.TimeoutPolicy response
timeout field to something larger than 15s
and see it reflected in the reconnects (setting to infinity
of course made the reconnects go away entirely)
so action items and things to discuss:
-
max_stream_duration
is an HTTPConnectionManager.CommonHttpProtocolOptions field (so it is not unique to a route), do we allow this to be configured? - Do we instead set a default request timeout when we know we have a websocket route?
- Do just we make users configure their request timeout explicitly when using websockets (the case after Envoy 1.18, because of the above change)?
we still are a little confused why @youngnick was not able to repro the issue, we may need to do some checks on differences between environments potentially
Just came back to this one:
- I think we can allow configuration of
max_stream_duration
, and we should especially note its usefulness for websockets in both its docstring and the docstring field. - Setting a default request timeout sounds reasonable when we have a websocket route, but I am not sure what it should be. @ccravens, do you have any thoughts here?
- This is where we are at right now, I hope we can do better somehow. I think that at least having some note in the websocket field docs that it's important to tune some timeouts would be helpful.
Just came back to this one:
- I think we can allow configuration of
max_stream_duration
, and we should especially note its usefulness for websockets in both its docstring and the docstring field.
Looks like max_stream_duration
is also useful not only for Websockets, but for gRPC streams too.
I'm trying to setup HTTPProxy for gRPC streaming service, and notice, that my service receives stream reset approximately after 15secs. Is there a way right now to tweak max_stream_duration
for particular service/route in HTTPProxy
@echupriyanov I believe that per https://github.com/projectcontour/contour/issues/3692#issuecomment-845351552, as a workaround right now if you set the route's response timeout to something larger than 15s
, that will effectively be the stream timeout value for that particular route. So something like:
apiVersion: projectcontour.io/v1
kind: HTTPProxy
metadata:
name: foo
namespace: default
spec:
routes:
- conditions:
- prefix: /
services:
- name: foo
port: 8080
timeoutPolicy:
response: 1h
...
Note that using response: infinity
would completely disable the timeout, though you may want something large but non-infinite in practice.
Let us know if that works for you.
@skriss Great thanks! Yes, setting response timeout to large values does the trick.
I don't think this is being investigated any longer, removing the label and closing since there hasn't been recent activity