uncloud icon indicating copy to clipboard operation
uncloud copied to clipboard

[BUG] Second machine always reports 'Down' due to IPv6 corrosion gossip protocol timeouts

Open KernelPryanic opened this issue 1 month ago • 0 comments

Describe the bug

I have two physical machines, Raspberry Pi and Linux mini server. Second machine added to cluster shows as "Down" while first machine is "Up". Doesn't matter which one is added first, the second one is always down. Here's the log from RPi:

user@raspberrypi: $ sudo systemctl status uncloud-corrosion
● uncloud-corrosion.service - Uncloud gossip-based distributed store
     Loaded: loaded (/etc/systemd/system/uncloud-corrosion.service; static)
     Active: active (running) since Sat 2025-12-06 18:33:25 CET; 10s ago
   Main PID: 17997 (uncloud-corrosi)
      Tasks: 12 (limit: 9443)
        CPU: 108ms
     CGroup: /system.slice/uncloud-corrosion.service
             └─17997 /usr/local/bin/uncloud-corrosion agent -c /var/lib/uncloud/corrosion/config.toml

Dec 06 18:33:25 raspberrypi uncloud-corrosion[17997]: LIST SUBQUERY 1
Dec 06 18:33:25 raspberrypi uncloud-corrosion[17997]: SCAN __corro_sub.temp_machines
Dec 06 18:33:25 raspberrypi uncloud-corrosion[17997]: USE TEMP B-TREE FOR ORDER BY
Dec 06 18:33:25 raspberrypi uncloud-corrosion[17997]:  sub_id=339517a0-3023-41fd-9422-e9ff638d94b6 sql_hash=ca8a320c793899be
Dec 06 18:33:25 raspberrypi uncloud-corrosion[17997]: 2025-12-06T17:33:25.843288Z  INFO corro_types::pubsub: Starting loop to run the subscription sub_id=339517a0-3023-41fd-9422-e9ff638d94b6
Dec 06 18:33:25 raspberrypi uncloud-corrosion[17997]: 2025-12-06T17:33:25.843292Z  INFO corro_types::pubsub: Notified condvar that the subscription is 'running' sub_id=339517a0-3023-41fd-9422-e9ff638d94b6
Dec 06 18:33:25 raspberrypi uncloud-corrosion[17997]: 2025-12-06T17:33:25.844476Z  INFO corro_types::pubsub: Deleted 0 old changes row in 71.833µs sub_id=339517a0-3023-41fd-9422-e9ff638d94b6
Dec 06 18:33:30 raspberrypi uncloud-corrosion[17997]: 2025-12-06T17:33:30.757559Z ERROR corro_agent::transport: error=deadline has elapsed
Dec 06 18:33:30 raspberrypi uncloud-corrosion[17997]: 2025-12-06T17:33:30.757579Z ERROR corro_agent::transport: error=deadline has elapsed
Dec 06 18:33:30 raspberrypi uncloud-corrosion[17997]: 2025-12-06T17:33:30.757583Z ERROR corro_agent::agent::handlers: could not write datagram [fdcc:9c41:865c:525:1a61:4df3:3d4d:4a2f]:51001: deadline has elapsed
user@computer: $ uc machine ls
NAME        STATE   ADDRESS         PUBLIC IP      WIREGUARD ENDPOINTS                        MACHINE ID
one         Down    10.210.1.1/24   *.*.*.*   192.168.*.*:51820, *.*.*.*:51820   e643e327b129dca0ec504871348116ab
two         Up      10.210.0.1/24   *.*.*.*   192.168.*.*:51820, *.*.*.*:51820   718c8ab5c9e419938d51f24e2bb14efd

How to reproduce

  1. uc machine init -i /home/user/.ssh/id_rsa --name one [email protected]
  2. uc machine add -i /home/user/.ssh/id_rsa --name two --no-caddy [email protected]

Expected behavior

Environment:

  • Uncloud versions:
    • Control (client) node (uc --version): uc version 0.14.0
    • Uncloud daemon (from the server) (uncloudd --version): uncloudd version 0.14.0
  • OS version (uname -a): Linux raspberrypi 6.12.47+rpt-rpi-v8 #1 SMP PREEMPT Debian 1:6.12.47-1+rpt1~bookworm (2025-09-16) aarch64 GNU/Linux
    • Client (control node): Linux computer 6.17.7-200.fc42.x86_64 #1 SMP PREEMPT_DYNAMIC Sun Nov 2 17:43:34 UTC 2025 x86_64 GNU/Linux
  • Server: Linux server 6.17.4-100.fc41.x86_64 #1 SMP PREEMPT_DYNAMIC Sun Oct 19 19:54:21 UTC 2025 x86_64 GNU/Linux

KernelPryanic avatar Dec 06 '25 17:12 KernelPryanic