firezone icon indicating copy to clipboard operation
firezone copied to clipboard

Call `on_disconnect` for most portal connect errors

Open jamilbk opened this issue 1 year ago • 3 comments

If the portal host is typo'd, connlib will keep trying to connect to it and never call on_disconnect. It might be better to fail quickly on name failures to the portal so that the client is not stuck in a signed in but inoperable state:

connlib	07:18:20.591010-0700	FirezoneNetworkExtensionmacOS	dev.firezone.firezone	Reconnecting to portal on transient client error: websocket connection failed: IO error: failed to lookup address information: nodename nor servname provided, or not known  backoff=22.666594299s max_elapsed_time=Some(86400s)

jamilbk avatar Apr 03 '24 14:04 jamilbk

Screenshot 2024-04-03 at 7 20 50 AM

jamilbk avatar Apr 03 '24 14:04 jamilbk

refs #5467

jamilbk avatar Jun 29 '24 16:06 jamilbk

I think for this particular one we can just immediately call on_disconnect if the initial connection fails.

For roaming / deploys, we can call reconnect which should have retry logic. wdyt @thomaseizinger ??

jamilbk avatar Jun 29 '24 16:06 jamilbk

I think for this particular one we can just immediately call on_disconnect if the initial connection fails.

For roaming / deploys, we can call reconnect which should have retry logic. wdyt @thomaseizinger ??

It is a bit tricky. At the moment, phoenix-channel does not differentiate between setting up a connection for the first time or because we called reconnect: We always just try to connect, see: https://github.com/firezone/firezone/blob/6641e1b70f0feb42e9077d37f4ac612916518e26/rust/phoenix-channel/src/lib.rs#L220-L223

It would be better if we can unambiguously identify "fatal" errors where we never want to retry. We have these already, like when the portal returns us 401. IO errors are a bit trickier because they are somewhat opaque (and platform-dependent).

I just experimented with this and I think with the addition of resolving the portal domain before connlib starts up in https://github.com/firezone/firezone/pull/5563, this is resolved at least for typos in the domain name. I tried to set the API as foo.firez.one and got the following behaviour:

❯ sudo RUST_LOG=wire::api=trace,debug target/debug/firezone-headless-client -u wss://foo.firez.one --firezone-id $(hostname) standalone
Place your right index finger on the fingerprint reader
2024-07-04T01:32:18.844543Z  INFO firezone_headless_client::standalone: git_version="gui-client-1.1.3-3-gf6e99752e-modified"
2024-07-04T01:32:18.845013Z  INFO firezone_headless_client::standalone: Running in headless / standalone mode
2024-07-04T01:32:18.880697Z  INFO firezone_headless_client::dns_control::linux: dns_control_method=Some(Systemd)
2024-07-04T01:32:18.881183Z DEBUG netlink_proto::connection: reading incoming messages
2024-07-04T01:32:18.881227Z DEBUG netlink_proto::codecs: NetlinkCodec: decoding next message
2024-07-04T01:32:18.881269Z DEBUG netlink_proto::connection: forwarding unsolicited messages to the connection handle
2024-07-04T01:32:18.881306Z DEBUG netlink_proto::connection: forwarding responses to previous requests to the connection handle
2024-07-04T01:32:18.881347Z DEBUG netlink_proto::connection: handling requests
2024-07-04T01:32:18.881389Z DEBUG netlink_proto::connection: sending messages
2024-07-04T01:32:21.731527Z ERROR connlib_client_shared: connlib failed: failed to lookup address information: Name or service not known
2024-07-04T01:32:21.731604Z ERROR firezone_headless_client: Got `on_disconnect` from connlib error=Io(Custom { kind: Uncategorized, error: "failed to lookup address information: Name or service not known" })
2024-07-04T01:32:21.731825Z DEBUG firezone_headless_client::dns_control::linux: Reverting DNS control...
Error: Firezone disconnected

Caused by:
    failed to lookup address information: Name or service not known

thomaseizinger avatar Jul 04 '24 01:07 thomaseizinger

I just experimented with this and I think with the addition of resolving the portal domain before connlib starts up in #5563, this is resolved at least for typos in the domain name.

Can we consider this resolved for that case @jamilbk?

thomaseizinger avatar Jul 23 '24 07:07 thomaseizinger

fixed by #5970

jamilbk avatar Jul 23 '24 07:07 jamilbk