defguard icon indicating copy to clipboard operation
defguard copied to clipboard

Gateway status at GUI sometimes doesn't sync properly

Open NerijusRazvodovskis opened this issue 8 months ago • 5 comments

Hello,

From time to time i can see edge cases when gateway status at GUI being showed as disconnected, but gateway itself is connected to the core services. That happens randomly if for some reasons, gateway losses connection to the core service for some time. For example adding logs from the gateway:

Apr 15 14:20:01 bnk.labas.io defguard-gateway[1328483]: [2025-04-15T14:20:01Z INFO defguard_gateway::gateway] Connected to Defguard gRPC endpoint: https://defguard.labas.io:444/
Apr 15 23:40:50 bnk.labas.io defguard-gateway[1328483]: [2025-04-15T23:40:50Z ERROR defguard_gateway::gateway] Disconnected from Defguard gRPC endoint: https://defguard.labas.io:444/: status: Unknown, message: "h2 protocol error: error reading a body from connec>
Apr 15 23:40:50 bnk.labas.io defguard-gateway[1328483]: [2025-04-15T23:40:50Z ERROR defguard_gateway::gateway] Updates stream aborted; reconnecting
Apr 15 23:41:20 bnk.labas.io defguard-gateway[1328483]: [2025-04-15T23:41:20Z ERROR defguard_gateway::gateway] Couldn't retrieve gateway configuration from the core. Using gRPC URL: https://defguard.labas.io:444/. Retrying in 10s. Error: status: Unavailable, mes>
Apr 15 23:41:40 bnk.labas.io defguard-gateway[1328483]: [2025-04-15T23:41:40Z INFO  defguard_gateway::gateway] Connected to Defguard gRPC endpoint: https://defguard.labas.io:444/

Meanwhile at the GUI, till now it shows as disconnected:

Image

Everything works from the gateway perspective, users can connect to it. However i believe this status at the GUI should be synced also automagically after gateway gets connected to the core services?

If i restart defguard-gateway service manually, GUI status gets changed also to connected.

Core service version: 1.2.3 Proxy service version: 1.2.0 Gateway version: 1.2.1

NerijusRazvodovskis avatar Apr 18 '25 11:04 NerijusRazvodovskis

Hi @NerijusRazvodovskis, Could you please try capturing logs from the core at the debug level as well? It would be helpful to see how the core is handling this case, especially since there are logs upon connect/disconnect events.

Thanks!

filipslezaklab avatar Apr 18 '25 12:04 filipslezaklab

@filipslezaklab right, will try to do it, however it could take a while to replicate

NerijusRazvodovskis avatar Apr 18 '25 12:04 NerijusRazvodovskis

@NerijusRazvodovskis i suspect that maybe gateway actually is not connected to core (even for a while - those connection losses may actually occur in a network?) - but gateway is designed to reconnect and ,hold' the status of peers so vpn will actually work..

teon avatar Apr 18 '25 13:04 teon

@teon but at the end of the log it says that gateway connected itself to the core services (just right after the issues)

Apr 15 23:41:40 bnk.labas.io defguard-gateway[1328483]: [2025-04-15T23:41:40Z INFO  defguard_gateway::gateway] Connected to Defguard gRPC endpoint: https://defguard.labas.io:444/

However, i will try to simulate this behaviour someday at next week. Already enabled debug mode on core services if it happens during the weekend.

NerijusRazvodovskis avatar Apr 18 '25 14:04 NerijusRazvodovskis

Well sadly, i couldn't replicate it by hands, tried just to block connection (few times) to core/proxy services and later resume it. It was handled correctly, seems like only sometimes this edge case happens and status at GUI doesn't gets synced. Will try to test it out further and will update this issue when possible.

NerijusRazvodovskis avatar Apr 25 '25 10:04 NerijusRazvodovskis

We had a similar behavior, that seemed to be triggered on our side when the ingress of the core was restarted (ingress nginx with GRPC, we know it's not officially supported here). The core logs didn't indicate that the gateway was reconnected, but the gateway were still working even with 2FA enabled. I didn't enable yet the debug logs, but in case this piece of information could help to replicate. I will try to replicate in the upcoming weeks on our side (and maybe get rid of the ingress for a sidecar with SSL offloading instead)

Lebvanih avatar May 05 '25 07:05 Lebvanih

Closing the issue, we can't reproduce. If you have any more data/debug logs - please reopen.

teon avatar Aug 06 '25 10:08 teon