gateway
gateway copied to clipboard
bug: race condition? when using mergeGateway and multiple gateways
Description:
I have following installation
% kubectl get gateway -A
NAMESPACE NAME CLASS ADDRESS PROGRAMMED AGE
echoserver foobar eg-internal 10.222.156.49 True 34m
envoy-gateway-system internal eg-internal 10.222.156.49 True 50m
full yaml spec: https://gist.github.com/zetaab/8caa34f5072d5a8efc5c2425c331c561
and httproutes https://gist.github.com/zetaab/149545f3e0ae17c0b925bafd3512d1eb
When I am adding httproutes and envoy proxies are restarting, it will randomly all services unavailable. When envoy pods are starting I can see following in logs
[2024-03-18 06:46:10.571][1][info][upstream] [source/common/listener_manager/lds_api.cc:99] lds: add/update listener 'envoy-gateway-system/internal/https'
[2024-03-18 06:46:10.573][1][info][upstream] [source/common/listener_manager/lds_api.cc:99] lds: add/update listener 'envoy-gateway-system/internal/http'
This means that all services are running (also services that are coming from other gateways than https). However, when the log says
[2024-03-18 07:17:24.768][1][info][upstream] [source/common/listener_manager/lds_api.cc:99] lds: add/update listener 'echoserver/foobar/https-foo'
[2024-03-18 07:17:24.769][1][info][upstream] [source/common/listener_manager/lds_api.cc:99] lds: add/update listener 'envoy-gateway-system/internal/http'
Nothing will work.
% curl https://foo.bar -v -k
* Trying 10.222.156.49:443...
* Connected to foo.bar (10.222.156.49) port 443
* ALPN: curl offers h2,http/1.1
* (304) (OUT), TLS handshake, Client hello (1):
* LibreSSL SSL_connect: SSL_ERROR_SYSCALL in connection to foo.bar:443
* Closing connection
curl: (35) LibreSSL SSL_connect: SSL_ERROR_SYSCALL in connection to foo.bar:443
% curl https://eg-int.company.com -v
* Trying 10.222.156.49:443...
* Connected to eg-int.company.com (10.222.156.49) port 443
* ALPN: curl offers h2,http/1.1
* (304) (OUT), TLS handshake, Client hello (1):
* CAfile: /etc/ssl/cert.pem
* CApath: none
* LibreSSL SSL_connect: SSL_ERROR_SYSCALL in connection to eg-int.company.com:443
* Closing connection
curl: (35) LibreSSL SSL_connect: SSL_ERROR_SYSCALL in connection to eg-int.company.com:443
Why listeners are loaded in different order sometimes?
Repro steps:
- use mergeGateways
- create two different gateways
- add and delete httproutes
- soonish you should see the situation that listener will fail for some reason (there is no error anywhere but the port is not just listening)
Environment: eg 1.0.0
Logs:
took listener configurations using egctl egctl config envoy-proxy listener -n envoy-gateway-system -l gateway.envoyproxy.io/owning-gatewayclass=eg-internal
not working: https://gist.github.com/zetaab/2e0f2f00174d4b6189290e095ebe5cf5
working: https://gist.github.com/zetaab/08d2f7b3bfbd04a28be14bd990552214
like can be seen: sometimes its missing 400+ rows of configurations. When this happens, I need to delete all other gateways than primary one (located in envoy-gateway-system) and then add other gateways back. Then everything starts to work again.
This issue has been automatically marked as stale because it has not had activity in the last 30 days.
this is still the issue with 1.0.1.
this has maybe something to do with infrastructure annotations in gateway. I am trying to reproduce this now with latest main
envoy-eg-internal-7f4ff7e4-579df9cdbb-lrm4k envoy [2024-05-05 00:02:18.682][1][info][upstream] [source/common/listener_manager/lds_api.cc:99] lds: add/update listener 'envoy-gateway-system/internal/http'
envoy-eg-internal-7f4ff7e4-579df9cdbb-lrm4k envoy [2024-05-05 00:02:18.687][1][info][upstream] [source/common/listener_manager/lds_api.cc:99] lds: add/update listener 'envoy-gateway-system/internal/https'
envoy-eg-internal-7f4ff7e4-579df9cdbb-lrm4k envoy [2024-05-05 00:02:18.720][1][info][upstream] [source/common/listener_manager/lds_api.cc:63] lds: remove listener 'envoy-gateway-system/internal/https'
envoy-eg-internal-7f4ff7e4-579df9cdbb-lrm4k envoy [2024-05-05 00:02:18.724][1][info][upstream] [source/common/listener_manager/lds_api.cc:99] lds: add/update listener 'echoserver/foobar/https-foo'
envoy-eg-internal-7f4ff7e4-579df9cdbb-lrm4k envoy [2024-05-05 00:02:18.725][1][info][upstream] [source/common/listener_manager/lds_api.cc:99] lds: add/update listener 'envoy-gateway-system/internal/http'
-> nothing will work. It is removing original listener and adding foobar listener
I went through the configuration and imo it looks quite same