The source of "Failed to run HTTPS probe" and "Failed to connect to relay server" is not printed to the logs
I am trying to upgrade Iroh from 0.34.0 to 0.92.0 and iroh-relay from 0.28.1 to iroh-relay 0.92.0 at the same time. Client update is here: https://github.com/chatmail/core/pull/7267 Relay I have updated manually by downloading release binary and installing it on a VPS. iroh-relay is running behind nginx.
Previously this setup with iroh-relay proxied behind nginx worked. I updated nginx config and added /ping endpoint:
# Proxy to iroh-relay service.
location /relay {
proxy_pass http://127.0.0.1:3340;
proxy_http_version 1.1;
# Upgrade header is normally set to "iroh derp http" or "websocket".
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
}
location /relay/probe {
proxy_pass http://127.0.0.1:3340;
proxy_http_version 1.1;
}
location /ping {
proxy_pass http://127.0.0.1:3340;
proxy_http_version 1.1;
}
location /generate_204 {
proxy_pass http://127.0.0.1:3340;
proxy_http_version 1.1;
}
However, the connection without DNS discovery and IP address sharing does not work currently, so I tried to see why using iroh-doctor.
I cloned iroh-doctor main branch (at commit https://github.com/n0-computer/iroh-doctor/commit/236d378899f09be21d7e1a93e0017dd922d335a4) and configured iroh:
cat ~/.config/iroh/iroh.config.toml
[[relay_nodes]]
url = "https://example.org./"
I cleaned ~/.local/share/iroh to make sure there is no existing key configuration or anything like that.
I then ran iroh-doctor from source using RUST_LOG=debug cargo run accept --disable-discovery and RUST_LOG=debug cargo run connect --relay-url https://example.org./ <redacted> --disable-discover (the key printed by first command). It did not connect.
The first command (iroh-doctor accept) printed this to the logs:
2025-10-03T00:24:02.482602Z WARN ep{me=ca75ec9fa5}:actor:reportgen.actor:run-probe{proto=Https delay=400ms relay_node=RelayNode { url: RelayUrl("https://example.org./"), quic: Some(RelayQuicConfig { port: 7842 }) }}: iroh::net_report::reportgen: probe failed: Failed to run HTTPS probe
I see this is failing while trying to use QUIC over port 7842. I understand it fails because nobody listens on port 7842.
Still, there is a source field in the error:
https://github.com/n0-computer/iroh/blob/9c8540fad98c3bde4cb9398a2bb82febab5c96a7/iroh/src/net_report/reportgen.rs#L459-L460
I think it should be printed somewhere, that would make it easier to understand the problem (connection failure, or wrong HTTP code, or something else) without having to guess.
Related problem showing how the errors look like: https://github.com/n0-computer/iroh/issues/3377
There is a similar problem that I actually have, the client sometimes manages to connect this way and sometimes fails. I have to restart it several times, eventually it works. But when it does not work, it prints this error:
RUST_LOG=debug cargo run connect --relay-url https://example.org./ d413fbd68ae44f43b20ab5d741a1e9de07e58a0806eb7280af3703a3a3720ad8 --disable-discovery
...
2025-10-03T00:38:27.983409Z DEBUG ep{me=c379afa657}:relay-actor:active-relay{url=https://example.org./}:dialing: iroh_relay::client::tls: Starting TLS handshake
2025-10-03T00:38:27.983714Z DEBUG ep{me=c379afa657}:relay-actor:active-relay{url=https://example.org./}:dialing: rustls::client::hs: No cached session for DnsName("example.org.")
2025-10-03T00:38:27.984607Z DEBUG ep{me=c379afa657}:relay-actor:active-relay{url=https://example.org./}:dialing: rustls::client::hs: Not resuming any session
2025-10-03T00:38:28.029039Z WARN ep{me=c379afa657}:relay-actor:active-relay{url=https://example.org./}: iroh::magicsock::transports::relay::actor: Failed to connect to relay server
There is an error here: https://github.com/n0-computer/iroh/blob/9c8540fad98c3bde4cb9398a2bb82febab5c96a7/iroh/src/magicsock/transports/relay/actor.rs#L221-L222
But DialError is not printed to the log, so I cannot tell what actually happened. When connection fails like this, nginx does not log this line:
Oct 03 00:38:26 hostname nginx[153567]: <redacted> nginx: 127.0.0.1 - - [03/Oct/2025:00:38:26 +0000] "GET /relay HTTP/1.1" 101 1779 "-" "-"
So apparently websocket connection does not happen, but I cannot tell why from the logs.
I found out that DNS was configured to return two IP addresses, one valid and one invalid, from previous experiments. So maybe it will start working now. But it would be much easier if I got actual error in the log and not only the outer "Failed to connect to relay server".
The issue maybe needs to be moved to iroh-doctor repo if this is solvable there, e.g. by configuring tracing_subscriber somehow.
FWIW the TLS validation error is fixed in #3486. But this issue is more about the lack of clear logs. Which indeed is worrysome. We generally use the "aternative" formatter :# for errors, which in anyhow used to print the source errors. Perhaps this broke in the switch to snafu.
The core issue for formatting is this: https://github.com/shepmaster/snafu/issues/390