rathole why is the rathole connection easily broken?

these are the settings on my vps (Public IP)

[server]
bind_addr = "0.0.0.0:58000"
default_token = "xxx"
heartbeat_interval = 30

# To Loopback Address for Nginx (Public Service)
[server.services.filebrowser]
bind_addr = "10.10.11.1:58097"

[server.services.navidrome]
bind_addr = "10.10.11.1:58095"

Server Log:

2024-03-10T16:19:18.476494Z  INFO config_watcher{path="rathole-server.toml"}: rathole::config_watcher: Start watching the config
2024-03-10T16:19:18.476573Z  INFO rathole::server: Listening at 0.0.0.0:58000
2024-03-10T16:19:20.511616Z  INFO connection{addr=66.96.225.80:46036}: rathole::server: Try to handshake a control channel
2024-03-10T16:19:20.512472Z  INFO connection{addr=66.96.225.80:46050}: rathole::server: Try to handshake a control channel
2024-03-10T16:19:20.540801Z  INFO connection{addr=66.96.225.80:46036}: rathole::server: Control channel established service=filebrowser
2024-03-10T16:19:20.541182Z  INFO connection{addr=66.96.225.80:46036}:handle{service=filebrowser}:run_tcp_connection_pool: rathole::server: Listening at 10.10.11.1:58097
2024-03-10T16:19:20.541728Z  INFO connection{addr=66.96.225.80:46050}: rathole::server: Control channel established service=navidrome
2024-03-10T16:19:20.541900Z  INFO connection{addr=66.96.225.80:46050}:handle{service=navidrome}:run_tcp_connection_pool: rathole::server: Listening at 10.10.11.1:58095
2024-03-10T16:23:36.505543Z ERROR connection{addr=66.96.225.80:46050}:handle{service=navidrome}:run: rathole::server: Failed to write control cmds: Broken pipe (os error 32)
2024-03-10T16:23:36.505622Z  INFO connection{addr=66.96.225.80:46050}:handle{service=navidrome}:run: rathole::server: Control channel shutdown
2024-03-10T16:23:49.081750Z ERROR connection{addr=66.96.225.80:46036}:handle{service=filebrowser}:run: rathole::server: Failed to write control cmds: Broken pipe (os error 32)
2024-03-10T16:23:49.081822Z  INFO connection{addr=66.96.225.80:46036}:handle{service=filebrowser}:run: rathole::server: Control channel shutdown
2024-03-10T16:24:09.457639Z  INFO connection{addr=66.96.225.80:46050}:handle{service=navidrome}:run_tcp_connection_pool: rathole::server: TCPListener shutdown

and these are the settings on my local server (Dynamic IP behind NAT)

[client]
remote_addr = "x.x.x.x:58000"
default_token = "xxx"
heartbeat_timeout = 0
retry_interval = 1

# From Loopback Address to remote_addr (Public Service)
[client.services.filebrowser]
local_addr = "127.0.0.1:8097"

[client.services.navidrome]
local_addr = "127.0.0.1:8095"

local log:

2024-03-10T16:21:47.641352Z  WARN handle{service=navidrome}:run: rathole::client: Failed to run the data channel: Failed to connect to x.x.x.x:58000: Connection timed out (os error 110)
2024-03-10T16:22:01.977000Z  WARN handle{service=filebrowser}:run: rathole::client: Failed to run the data channel: Failed to connect to x.x.x.x:58000: Connection timed out (os error 110)
2024-03-10T16:22:01.977151Z  WARN handle{service=filebrowser}:run: rathole::client: Failed to run the data channel: Failed to connect to x.x.x.x:58000: Connection timed out (os error 110)
2024-03-10T16:22:08.121382Z  WARN handle{service=navidrome}:run: rathole::client: Failed to run the data channel: Failed to connect to x.x.x.x:58000: Connection timed out (os error 110)
2024-03-10T16:22:08.121592Z  WARN handle{service=navidrome}:run: rathole::client: Failed to run the data channel: Failed to connect to x.x.x.x:58000: Connection timed out (os error 110)
2024-03-10T16:22:08.121397Z  WARN handle{service=navidrome}:run: rathole::client: Failed to run the data channel: Failed to connect to x.x.x.x:58000: Connection timed out (os error 110)
2024-03-10T16:22:18.361328Z  WARN handle{service=filebrowser}:run: rathole::client: Failed to run the data channel: Failed to connect to x.x.x.x:58000: Connection timed out (os error 110)
2024-03-10T16:22:18.361314Z  WARN handle{service=filebrowser}:run: rathole::client: Failed to run the data channel: Failed to connect to x.x.x.x:58000: Connection timed out (os error 110)
2024-03-10T16:22:20.409205Z  WARN handle{service=navidrome}:run: rathole::client: Failed to run the data channel: Failed to connect to x.x.x.x:58000: Connection timed out (os error 110)
2024-03-10T16:22:59.320877Z  WARN handle{service=filebrowser}:run: rathole::client: Failed to run the data channel: Failed to connect to x.x.x.x:58000: Connection timed out (os error 110)

Mar 10 '24 16:03 tarokeitaro

用了一段时间，感觉不稳定，有个端口连不上了，另外一个端口能连但很慢，重启 client 才好，换回 frp了。。

2月21开始用的，用到现在，用了 20 天左右

Mar 11 '24 02:03 kzhui125

长期用大半年了，基本上只能维持三五天不报错，相当不稳定

Apr 28 '24 03:04 lanyuue

看上去，作者参加工作后就没时间理会这个项目了

长期用大半年了，基本上只能维持三五天不报错，相当不稳定

Apr 28 '24 06:04 typochecker

anyone got a solution? this tool is good, but its not that stable, and I still need it.

May 06 '24 01:05 tarokeitaro

@tarokeitaro you can use frp instead currently.

May 06 '24 03:05 kzhui125

8k star却没人能提commit给项目续命真是悲哀啊

May 07 '24 01:05 lanyuue

@tarokeitaro This is what I have in Cron every 6 hours, it will check for Connection timed out (os error 110) in the container logs. If found, this will run the following command cd /docker docker compose up -d --force-recreate rathole

Of course, you can customize it, I have it this way because my rathole container is in a massive docker-compose.yml file with a lot of other containers

After it restarts/fails to restart or doesn't find the string Connection timed out (os error 110), it will send a discord notification

rathole_recreate.sh

#!/bin/bash

# Define container name (assuming it matches the service name in docker-compose.yml)
container_name="rathole"  # Replace if your service name is different

# Error string to search for
error_string="Connection timed out (os error 110)"

# Discord webhook URL (replace with your actual webhook URL)
webhook_url=""

# Function to send Discord notification
function send_discord_notification() {
  message="$1"
  content="$message"
  curl -H "Content-Type: application/json" -X POST -d "{\"username\": \"Rathole\", \"content\": \"$content\"}" "$webhook_url"
}

# Get container logs
logs=$(docker logs $container_name)

# Check if error string exists in logs
if grep -q "$error_string" <<< "$logs"; then
  message="Error '$error_string' found in logs. Recreating container..."
  echo "$message"
  send_discord_notification "$message"
  
  # Change directory to /docker (assuming your docker-compose.yml is there)
  cd /docker

  # Recreate container using docker-compose
  docker compose up -d --force-recreate rathole

  recreate_result=$?
  
  if [ $recreate_result -eq 0 ]; then
    message="Container $container_name recreated successfully."
  else
    message="Error: Failed to recreate container $container_name."
  fi
  echo "$message"
  send_discord_notification "$message"
else
  message="Error string $error_string not found in logs. Container $container_name seems healthy."
  echo "$message"
  send_discord_notification "$message"
fi

docker-compose.yml

  rathole:
    image: rapiz1/rathole:latest
    container_name: rathole
    networks:
      - proxy    
    restart: unless-stopped
    command: --client /app/config.toml
    volumes:
      - $DOCKERDIR/rathole/client.toml:/app/config.toml

client.toml

# client.toml
[client]
remote_addr = "x.x.x.x:2333" # The address of the server. The port must be the same with the port in `server.bind_addr`
heartbeat_timeout = 20 # Optional. Set to 0 to disable the application-layer heartbeat test. The value must be greater than `server.heartbeat_interval`. Default: 40 seconds
retry_interval = 1 # Optional. The interval between retry to connect to the server. Default: 1 second

[client.services.galax_https]
token = "" # Must be the same with the server to pass the validation
local_addr = "traefik:443" # The address of the service that needs to be forwarded

server.toml

[server]
bind_addr = "0.0.0.0:2333" # `2333` specifies the port that rathole listens for clients
heartbeat_interval = 10

[server.services.galax_https]
token = "" # Token that is used to authenticate the client for the service. Change to a arbitrary value.
bind_addr = "0.0.0.0:443" # `5202` specifies the port that exposes `my_nas_ssh` to the Internet

May 30 '24 10:05 just5ky