Urgent: intermittent connection to remote host through Tailscale action
We are using the Tailscale action in our GitHub Actions workflow. Recently, we've started receiving an error stating that we cannot reach the remote host through Tailscale, as shown in the logs below generated by the following commands:
Commands:
tailscale status
tailscale ping $SSH_HOST
Errors:
tailscale status
100.69.57.42 github-fv-az1726-261 github-fv-az1726-261.taild7c8e.ts.net linux -
# Health check:
# - no DERP home
# Update available: 1.52.0 -> 1.82.5, run `tailscale update` or `tailscale set --auto-update` to update.
ping "100.112.202.141" timed out
ping "100.112.202.141" timed out
ping "100.112.202.141" timed out
ping "100.112.202.141" timed out
pong from *** (100.112.202.141) via DERP(dbi) in 233ms
pong from *** (100.112.202.141) via DERP(dbi) in 266ms
pong from *** (100.112.202.141) via DERP(dbi) in 240ms
pong from *** (100.112.202.141) via DERP(dbi) in 358ms
pong from *** (100.112.202.141) via DERP(dbi) in 234ms
pong from *** (100.112.202.141) via DERP(dbi) in 249ms
direct connection not established
Run ping -c 5 $SSH_HOST
PING ***.taild7c8e.ts.net (100.112.202.141) 56(84) bytes of data.
--- ***.taild7c8e.ts.net ping statistics ---
5 packets transmitted, 0 received, 100% packet loss, time 4091ms
We have retried the action jobs multiple times. Sometimes, we are able to access the remote host, and other times we cannot, even though we try rerunning the jobs after varying intervals.
We really struggle with the same. I suspect it might be caused by the DERP server selection.
Also seeing 100% packet loss
Same seeing the same issue, it's intermittent, eventually after a few reruns it works. This is on Github's Public runners
Also started seeing this. Seems to be somehow specific to DERP servers mostly used by GitHub Actions. Probably congestion?
Is the only workaround to run a custom DERP server?
There can be a delay between the time that your GitHub Action's Tailscale client joins your tailnet and the destination Tailscale client learns of its presence and that it's allowed to connect, which manifests as lack of connectivity.
v4 of the GitHub action now includes a ping parameter that you can use to wait for connectivity before proceeding. We hope that this will resolve your issue. If it does not, please feel free to reopen this ticket.