bore icon indicating copy to clipboard operation
bore copied to clipboard

"Port already in use" error when restarting client after connection loss

Open ricardoapaes opened this issue 11 months ago • 9 comments
trafficstars

While using the client software on a Windows machine, I noticed instances where it would stop functioning. I realized that when the internet connection dropped (internet connectivity in Brazil can be unreliable), the software wouldn't detect the outage and would stop working (which is expected). However, when the connection was restored, the software often failed to resume, especially after prolonged periods (more than 10 minutes). As a workaround, I created a tool that polled for connectivity every minute. If the connection dropped, it would stop and restart the client. Unfortunately, I encountered a recurring issue: the error "port already in use" occurred frequently. After examining the code, I couldn't identify any obvious improvements, as it seemed to already include checks for removing stale connections.

ricardoapaes avatar Dec 10 '24 11:12 ricardoapaes

@ekzhang We’re encountering the same issue: after any connection interruption, it becomes impossible to reconnect the client using a fixed host-port due to a “port already in use” error. This is likely related to the fact that SO_REUSEPORT and SO_REUSEADDR are not set by default.

sc0rp10 avatar Apr 17 '25 16:04 sc0rp10

I can't reproduce this unfortunately on macOS. It may be an operating-system dependent issue, and I'm not sure about all of the implicates of SO_REUSEADDR and SO_REUSEPORT for different operating systems.

ekzhang avatar Apr 17 '25 17:04 ekzhang

I can't reproduce this unfortunately on macOS. It may be an operating-system dependent issue, and I'm not sure about all of the implicates of SO_REUSEADDR and SO_REUSEPORT for different operating systems.

I face the same problem when I change the network on my phone. I use bore to output a web server from the phone to the network (it's stupid, but it`s easly) and when I change the access point, bore local falls down, and when I reconnect it says that the port is in use. Is it possible to add an argument that allows waiting for the client to reconnect within N seconds when the client is lost?

bropines avatar Apr 19 '25 19:04 bropines

Hm, so I spent a bit of time looking at this again. To clarify, when you run:

bore local 3000 --to bore.pub

Everything works, including on reconnection. But if you repeatedly do:

bore local 3000 --to bore.pub --port 12345

Then sometimes you get a bore server error. Unfortunately if this is the case, there's nothing we can do about it for the public server — selecting ports on the public bore.pub server is offered on a best-effort basis since it's shared among all users. You can't guarantee that you'll have access to any particular port number.

That said, if you're running your own private bore server instance, we could potentially add a CLI flag that enables connecting to a port already in-use on that server. We wouldn't do it via SO_REUSEPORT though, since that would break the tunneling protocol, we'd need to actually kill the previous connection (if any) before binding the new one.

If that's an acceptable solution then I'm OK with having that as a non-default option for bore server.

ekzhang avatar Apr 19 '25 19:04 ekzhang

Our setup uses a self‑hosted server to make remote devices accessible over the Internet. However, after certain client actions (we’ve identified a 100% reproducible sequence, though it’s complex to describe), the server’s connection hangs indefinitely: the server still believes the client is connected, but the client cannot reestablish its previous session. In these cases, restarting is the only reliable remedy.

I believe the heartbeat mechanism must be overhauled. Rather than the current approach—where the server sends a heartbeat over a half‑closed socket and assumes the connection remains alive—the server and client should implement a true “ping‑pong” scheme, actively exchanging keep‑alive messages on an open channel.

Additionally, I implemented an experimental change: when a new client attempts to connect using a specific host–port pair, we first check a hashmap—where each key is a host–port pair and each value is a TcpListener instance—to see if that pair already exists. If it does, I terminate the previous connection before establishing the new one. I’m not certain this approach is ideal (especially for public servers that assign client ports dynamically), but it demonstrates that the current heartbeat mechanism can fail in certain cases.

sc0rp10 avatar Apr 19 '25 19:04 sc0rp10

We wouldn't do it via SO_REUSEPORTđź“‹ though, since that would break the tunneling protocol, we'd need to actually kill the previous connection (if any) before binding the new one.

What if we allow reconnecting only with --secret? That is, only the client that knows the key will be able to reconnect to the used port. So we will get rid of any attacks, and we will be able to safely reconnect one client...

bropines avatar Apr 19 '25 19:04 bropines

I believe the heartbeat mechanism must be overhauled. Rather than the current approach—where the server sends a heartbeat over a half‑closed socket and assumes the connection remains alive—the server and client should implement a true “ping‑pong” scheme, actively exchanging keep‑alive messages on an open channel.

TCP sends acknowledgements already and keeps a receive buffer, so I don't think this should hang indefinitely. It will eventually exit, even if it may take some time. I think that's ok though, this makes it possible for people to reconnect if there were connection errors.

If you want to configure this on your server, you can either tune sysctl's like net.ipv4.tcp_retries2 and net.ipv4.tcp_rto_{min,max} or patch bore with a couple lines of code to change the TCP_USER_TIMEOUT sockopt. I don't think we'll change how heartbeats work though.

Additionally, I implemented an experimental change: when a new client attempts to connect using a specific host–port pair, we first check a hashmap—where each key is a host–port pair and each value is a TcpListener instance—to see if that pair already exists. If it does, I terminate the previous connection before establishing the new one.

What if we allow reconnecting only with --secret? That is, only the client that knows the key will be able to reconnect to the used port. So we will get rid of any attacks, and we will be able to safely reconnect one client...

Yeah, that's what I was describing above. I think it's ok to make it an option.

ekzhang avatar Apr 19 '25 20:04 ekzhang

If you want to configure this on your server, you can either tune sysctl's like net.ipv4.tcp_retries2 and net.ipv4.tcp_rto_{min,max} or patch bore with a couple lines of code to change the TCP_USER_TIMEOUT sockopt. I don't think we'll change how heartbeats work though.

I’ll test these sysctl parameters and share the results. Based on our previous observations, the client can remain hung in this state — when the server refuses to connect due to "port already in use" - for several days. I believe this duration far exceeds the default values for these parameters, but I’ll verify.

sc0rp10 avatar Apr 19 '25 20:04 sc0rp10

Yeah, that's what I was describing above. I think it's ok to make it an option.

May I know when you will be able to add this? Or should I not rush you?

bropines avatar Apr 24 '25 10:04 bropines