gateway icon indicating copy to clipboard operation
gateway copied to clipboard

Multiple Gateway listeners with different hostnames and same certificate not working in a browser session

Open jaynis opened this issue 1 year ago • 7 comments

I have configured two listeners on a Gateway for two different (sub)domains, e.g. foo.example.com and bar.example.com:

apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: envoy-gateway-https
  namespace: default
spec:
  gatewayClassName: envoy
  listeners:
  - allowedRoutes:
      namespaces:
        from: Same
    hostname: 'foo.example.k3d'
    name: example-foo
    port: 443
    protocol: HTTPS
    tls:
      certificateRefs:
      - group: ""
        kind: Secret
        name: example-wildcard-tls
        namespace: default
      mode: Terminate
  - allowedRoutes:
      namespaces:
        from: Same
    hostname: 'bar.example.k3d'
    name: example-bar
    port: 443
    protocol: HTTPS
    tls:
      certificateRefs:
      - group: ""
        kind: Secret
        name: example-wildcard-tls
        namespace: default
      mode: Terminate

If I try to reach foo.example.com or bar.example.com with a CLI tool such as curl both domains are working fine and the desired content is correctly served. However, if I try to do the same from within a browser session with Chrome or Firefox, only one of both is working and the other one responds with 404's. Which one is working and which one not depends on the order in which I attempt to reach it: If I open Chrome and navigate to foo.example.com the desired content is served correctly, but if I go to bar.example.com afterwards I receive a 404. If I close the browser and open it again and this time first navigate to bar.example.com it is correctly served and subsequent requests to foo.example.com receive 404's. Therefore I assume this might have something to do with connection sharing / reuse, but this is only a wild guess.

In the envoy debug logs I can see the failing request and the reason for the 404 seems to be a problem with the route matching:

[2024-02-22 15:32:32.711][26][debug][router] [source/common/router/router.cc:446] [Tags: "ConnectionId":"3","StreamId":"11246534096143794784"] no route match for URL '/'

If I remove the hostname constraint from the Gateway entirely, all domains are working fine as well as with curl as in a browser session.

jaynis avatar Feb 22 '24 15:02 jaynis

Does this perhaps involve HTTP/2 connection coalescing by the browser?

akevdmeer avatar Feb 22 '24 19:02 akevdmeer

hey @jaynis would help if you shared

  • the HTTPRoute attached to the Gateway
  • execute curl on Hostname1 and again on HostName2
  • execute the request from the browser on Hostname1 and again on Hostnam2
  • capture the Access Logs available in the Envoy Pod for all the 4 requests, (looks like https://www.envoyproxy.io/docs/envoy/latest/configuration/observability/access_log/usage#default-format-string)

arkodg avatar Feb 22 '24 20:02 arkodg

Does this perhaps involve HTTP/2 connection coalescing by the browser?

This was what I meant with "connection sharing / reuse". I further looked into this and I think the HTTP/2 connection coalescing in conjunction with my TLS certificate, which is valid / has SANs for both hosts (foo.example.comand bar.example.com), might be the root cause here. What I assume is happening is the following:

Envoy creates two listeners, one for each host and as my certificate is valid for both hosts, it can be used for each of the listeners. Now, when a request is sent to foo.example.com from a browser, it opens a single HTTP/2 connection to envoy with :authority: foo.example.com which will be attached to the corresponding listener and successfully handles the request for that host. However, subsequent requests to bar.example.com from the same browser session will reuse the same connection but with :authority: bar.example.com. Those requests end up at the listener for foo.example.com and as the authority does not match that host it results in a 404.

@arkodg I will try to provide the access logs as per your request later, but this is how the debug log of a failing request looks like:

[2024-02-22 15:32:32.711][26][debug][http] [source/common/http/conn_manager_impl.cc:393] [Tags: "ConnectionId":"3"] new stream
[2024-02-22 15:32:32.711][26][debug][http] [source/common/http/conn_manager_impl.cc:1192] [Tags: "ConnectionId":"3","StreamId":"11246534096143794784"] request headers complete (end_stream=true):
':method', 'GET'
':authority', bar.example.com'
':scheme', 'https'
':path', '/'
'cache-control', 'max-age=0'
'sec-ch-ua', '"Not A(Brand";v="99", "Google Chrome";v="121", "Chromium";v="121"'
'sec-ch-ua-mobile', '?0'
'sec-ch-ua-platform', '"Linux"'
'upgrade-insecure-requests', '1'
'user-agent', 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36'
'accept', 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7'
'sec-fetch-site', 'none'
'sec-fetch-mode', 'navigate'
'sec-fetch-user', '?1'
'sec-fetch-dest', 'document'
'accept-encoding', 'gzip, deflate, br, zstd'
'accept-language', 'en-DE,en;q=0.9,de-DE;q=0.8,de;q=0.7,en-GB;q=0.6,en-US;q=0.5'

[2024-02-22 15:32:32.711][26][debug][http] [source/common/http/conn_manager_impl.cc:1175] [Tags: "ConnectionId":"3","StreamId":"11246534096143794784"] request end stream
[2024-02-22 15:32:32.711][26][debug][connection] [./source/common/network/connection_impl.h:98] [Tags: "ConnectionId":"3"] current connecting state: false
[2024-02-22 15:32:32.711][26][debug][router] [source/common/router/router.cc:446] [Tags: "ConnectionId":"3","StreamId":"11246534096143794784"] no route match for URL '/'
[2024-02-22 15:32:32.711][26][debug][http] [source/common/http/filter_manager.cc:1015] [Tags: "ConnectionId":"3","StreamId":"11246534096143794784"] Preparing local reply with details route_not_found
[2024-02-22 15:32:32.711][26][debug][http] [source/common/http/filter_manager.cc:1057] [Tags: "ConnectionId":"3","StreamId":"11246534096143794784"] Executing sending local reply.
[2024-02-22 15:32:32.711][26][debug][http] [source/common/http/conn_manager_impl.cc:1869] [Tags: "ConnectionId":"3","StreamId":"11246534096143794784"] encoding headers via codec (end_stream=true):
':status', '404'
'date', 'Thu, 22 Feb 2024 15:32:32 GMT'

[2024-02-22 15:32:32.711][26][debug][http] [source/common/http/conn_manager_impl.cc:1975] [Tags: "ConnectionId":"3","StreamId":"11246534096143794784"] Codec completed encoding stream.
[2024-02-22 15:32:32.711][26][debug][http2] [source/common/http/http2/codec_impl.cc:1450] [Tags: "ConnectionId":"3"] stream 21 closed: 0
[2024-02-22 15:32:32.711][26][debug][http2] [source/common/http/http2/codec_impl.cc:1513] [Tags: "ConnectionId":"3"] Recouping 0 bytes of flow control window for stream 21.

My HTTPRoutes are nothing special and look like this:

apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: foo-example # bar-example on the other route
  namespace: default
spec:
  hostnames:
  - foo.example.com # bar.example.com on the other route
  parentRefs:
  - group: gateway.networking.k8s.io
    kind: Gateway
    name: envoy-gateway-https
    namespace: default
  rules:
  - backendRefs:
    - group: ""
      kind: Service
      name: foo-service # bar-service on the other route 
      port: 8080
      weight: 1
    matches:
    - path:
        type: PathPrefix
        value: /

jaynis avatar Feb 22 '24 22:02 jaynis

ah you're hitting https://github.com/envoyproxy/envoy/issues/6767 lets keep this issue open to track upstream work, until then a workaround here is to flatten your listener

  - allowedRoutes:
      namespaces:
        from: Same
    hostname: *.example.com
    name: example
    port: 443
    protocol: HTTPS
    tls:
      certificateRefs:
      - group: ""
        kind: Secret
        name: example-wildcard-tls
        namespace: default
      mode: Terminate
  - allowedRoutes:

Since you are also specifying the hostname on each HTTPRoute , anything apart from foo.example.com or bar.example.com will get a 404 NR

arkodg avatar Feb 22 '24 22:02 arkodg

Yes that sounds pretty much like what I experience. Thank you for referencing that. Another workaround would be to use separate certificates per listener but as one is not always in control of the certificates I guess your suggestion is a bit more practical :+1: .

jaynis avatar Feb 22 '24 22:02 jaynis

Can we move this to backlog @arkodg ?

Xunzhuo avatar Mar 07 '24 07:03 Xunzhuo

This issue has been automatically marked as stale because it has not had activity in the last 30 days.

github-actions[bot] avatar Apr 07 '24 04:04 github-actions[bot]

This issue has been automatically marked as stale because it has not had activity in the last 30 days.

github-actions[bot] avatar Jun 22 '24 00:06 github-actions[bot]

This issue has been automatically marked as stale because it has not had activity in the last 30 days.

github-actions[bot] avatar Jul 22 '24 04:07 github-actions[bot]