gateway icon indicating copy to clipboard operation
gateway copied to clipboard

bug: race condition? when using mergeGateway and multiple gateways

Open zetaab opened this issue 1 year ago • 3 comments

Description:

I have following installation

% kubectl get gateway -A             
NAMESPACE              NAME       CLASS         ADDRESS          PROGRAMMED   AGE
echoserver             foobar     eg-internal   10.222.156.49    True         34m
envoy-gateway-system   internal   eg-internal   10.222.156.49    True         50m

full yaml spec: https://gist.github.com/zetaab/8caa34f5072d5a8efc5c2425c331c561

and httproutes https://gist.github.com/zetaab/149545f3e0ae17c0b925bafd3512d1eb

When I am adding httproutes and envoy proxies are restarting, it will randomly all services unavailable. When envoy pods are starting I can see following in logs

[2024-03-18 06:46:10.571][1][info][upstream] [source/common/listener_manager/lds_api.cc:99] lds: add/update listener 'envoy-gateway-system/internal/https'
[2024-03-18 06:46:10.573][1][info][upstream] [source/common/listener_manager/lds_api.cc:99] lds: add/update listener 'envoy-gateway-system/internal/http'

This means that all services are running (also services that are coming from other gateways than https). However, when the log says

[2024-03-18 07:17:24.768][1][info][upstream] [source/common/listener_manager/lds_api.cc:99] lds: add/update listener 'echoserver/foobar/https-foo'
[2024-03-18 07:17:24.769][1][info][upstream] [source/common/listener_manager/lds_api.cc:99] lds: add/update listener 'envoy-gateway-system/internal/http'

Nothing will work.

 % curl https://foo.bar -v -k     
*   Trying 10.222.156.49:443...
* Connected to foo.bar (10.222.156.49) port 443
* ALPN: curl offers h2,http/1.1
* (304) (OUT), TLS handshake, Client hello (1):
* LibreSSL SSL_connect: SSL_ERROR_SYSCALL in connection to foo.bar:443 
* Closing connection
curl: (35) LibreSSL SSL_connect: SSL_ERROR_SYSCALL in connection to foo.bar:443
% curl https://eg-int.company.com -v
*   Trying 10.222.156.49:443...
* Connected to eg-int.company.com (10.222.156.49) port 443
* ALPN: curl offers h2,http/1.1
* (304) (OUT), TLS handshake, Client hello (1):
*  CAfile: /etc/ssl/cert.pem
*  CApath: none
* LibreSSL SSL_connect: SSL_ERROR_SYSCALL in connection to eg-int.company.com:443 
* Closing connection
curl: (35) LibreSSL SSL_connect: SSL_ERROR_SYSCALL in connection to eg-int.company.com:443 

Why listeners are loaded in different order sometimes?

Repro steps:

  1. use mergeGateways
  2. create two different gateways
  3. add and delete httproutes
  4. soonish you should see the situation that listener will fail for some reason (there is no error anywhere but the port is not just listening)

Environment: eg 1.0.0

Logs:

took listener configurations using egctl egctl config envoy-proxy listener -n envoy-gateway-system -l gateway.envoyproxy.io/owning-gatewayclass=eg-internal

not working: https://gist.github.com/zetaab/2e0f2f00174d4b6189290e095ebe5cf5

working: https://gist.github.com/zetaab/08d2f7b3bfbd04a28be14bd990552214

like can be seen: sometimes its missing 400+ rows of configurations. When this happens, I need to delete all other gateways than primary one (located in envoy-gateway-system) and then add other gateways back. Then everything starts to work again.

zetaab avatar Mar 18 '24 07:03 zetaab

This issue has been automatically marked as stale because it has not had activity in the last 30 days.

github-actions[bot] avatar Apr 17 '24 08:04 github-actions[bot]

this is still the issue with 1.0.1.

zetaab avatar Apr 25 '24 07:04 zetaab

this has maybe something to do with infrastructure annotations in gateway. I am trying to reproduce this now with latest main

zetaab avatar Apr 26 '24 20:04 zetaab

envoy-eg-internal-7f4ff7e4-579df9cdbb-lrm4k envoy [2024-05-05 00:02:18.682][1][info][upstream] [source/common/listener_manager/lds_api.cc:99] lds: add/update listener 'envoy-gateway-system/internal/http'
envoy-eg-internal-7f4ff7e4-579df9cdbb-lrm4k envoy [2024-05-05 00:02:18.687][1][info][upstream] [source/common/listener_manager/lds_api.cc:99] lds: add/update listener 'envoy-gateway-system/internal/https'
envoy-eg-internal-7f4ff7e4-579df9cdbb-lrm4k envoy [2024-05-05 00:02:18.720][1][info][upstream] [source/common/listener_manager/lds_api.cc:63] lds: remove listener 'envoy-gateway-system/internal/https'
envoy-eg-internal-7f4ff7e4-579df9cdbb-lrm4k envoy [2024-05-05 00:02:18.724][1][info][upstream] [source/common/listener_manager/lds_api.cc:99] lds: add/update listener 'echoserver/foobar/https-foo'
envoy-eg-internal-7f4ff7e4-579df9cdbb-lrm4k envoy [2024-05-05 00:02:18.725][1][info][upstream] [source/common/listener_manager/lds_api.cc:99] lds: add/update listener 'envoy-gateway-system/internal/http'

-> nothing will work. It is removing original listener and adding foobar listener

I went through the configuration and imo it looks quite same

zetaab avatar May 06 '24 07:05 zetaab