infix icon indicating copy to clipboard operation
infix copied to clipboard

R2S rousette crash on boot if LAN not connected

Open minexn opened this issue 1 year ago • 11 comments

Current Behavior

The system booted with WAN connected and LAN disconnected.

lan             ethernet   DOWN        b2:b2:41:23:71:82                        
                ipv4                   192.168.2.1/24 (static)
wan             ethernet   UP          b2:b2:41:23:71:83                        
                ipv4                   10.10.3.101/24 (dhcp)

Rousette keeps crashing

Oct 22 06:47:02 r2s rousette[1883]: [2024-10-22 06:47:02.245] [rousette] [info] NACM config validation: Anonymous user access disabled 
Oct 22 06:47:02 r2s rousette[1883]: [2024-10-22 06:47:02.248] [rousette] [warning] Telemetry disabled. No CzechLight YANG modules found. 
Oct 22 06:47:07 r2s finit[1]: Service rousette keeps crashing, not restarting.

The system booted with WAN and LAN connected.

lan             ethernet   UP          b2:b2:41:23:71:82                        
                ipv4                   192.168.2.1/24 (static)
wan             ethernet   UP          b2:b2:41:23:71:83                        
                ipv4                   10.10.3.101/24 (dhcp)

Rousette starts and responds to queries.

Oct 22 06:49:09 r2s rousette[1542]: [2024-10-22 06:49:09.645] [rousette] [info] NACM config validation: Anonymous user access disabled 
Oct 22 06:49:09 r2s rousette[1542]: [2024-10-22 06:49:09.662] [rousette] [warning] Telemetry disabled. No CzechLight YANG modules found. 

Expected Behavior

Rousette starts and responds to queries.

Oct 22 06:49:09 r2s rousette[1542]: [2024-10-22 06:49:09.645] [rousette] [info] NACM config validation: Anonymous user access disabled 
Oct 22 06:49:09 r2s rousette[1542]: [2024-10-22 06:49:09.662] [rousette] [warning] Telemetry disabled. No CzechLight YANG modules found. 

Steps To Reproduce

load v24.10.1 unplug LAN reboot check log

Additional information

Factory configuration

minexn avatar Oct 22 '24 06:10 minexn

Reproduced on my R2S:

Oct 23 05:35:26 r2s finit[1]: Service rousette[2080] died, restarting in 5000 msec (10/10)
Oct 23 05:35:27 r2s finit[1]: Starting rousette[2163]
Oct 23 05:35:27 r2s rousette[2163]: [2024-10-23 05:35:27.538] [rousette] [info] NACM config validation: Anonymous user access disabled 
Oct 23 05:35:27 r2s rousette[2163]: [2024-10-23 05:35:27.541] [rousette] [warning] Telemetry disabled. No CzechLight YANG modules found. 
Oct 23 05:35:27 r2s rousette[2163]: terminate called after throwing an instance of 'std::runtime_error'
Oct 23 05:35:27 r2s rousette[2163]:   what():  Server error: Host not found (authoritative)
Oct 23 05:35:32 r2s finit[1]: Service rousette keeps crashing, not restarting.

troglobit avatar Oct 23 '24 05:10 troglobit

Workaround, as suggested by @mattiaswal, helps:

admin@r2s:/cfg$ diff backup.cfg startup-config.cfg 
--- backup.cfg
+++ startup-config.cfg
@@ -39,7 +39,8 @@
       },
       {
         "name": "wan",
-        "type": "infix-if-type:ethernet"
+        "type": "infix-if-type:ethernet",
+        "ietf-ip:ipv6": {}
       }
     ]
   },

troglobit avatar Oct 23 '24 05:10 troglobit

If I try to mimic the same setup in Qemu, using the x86_64 build, by disabling ipv6 on all ethernet interfaces, I cannot reproduce the problem. Very odd, need to discuss this further with @mattiaswal.

troglobit avatar Oct 23 '24 05:10 troglobit

After discussions with @mattiaswal and the rest of core team, we decided yesterday to check if this was an issue also with the standard aarch64 builds on tier one customer HW (Marvell CRB derivatives).

These tests were concluded this morning, without any problems.

So, it seems this issue is localized to the R2S build.

troglobit avatar Oct 24 '24 04:10 troglobit

Also had that issue with rousette bailing out with:

rousette[1957]: terminate called after throwing an instance of 'std::runtime_error' rousette[1957]: what(): Server error: Host not found (authoritative)

Turns out that the boost library is not willing to resolve a numeric IPv6 host (::1) because its resolver flags are set to address_configured by default. See https://www.boost.org/doc/libs/1_83_0/doc/html/boost_asio/reference/ip__resolver_base.html for more info.

The following patch resolved it for me:

--- nghttp2-asio-e877868abe06a83ed0a6ac6e245c07f6f20866b5/lib/asio_server.cc
+++ nghttp2-asio-e877868abe06a83ed0a6ac6e245c07f6f20866b5/lib/asio_server.cc
@@ -82,8 +82,13 @@ boost::system::error_code server::bind_and_listen(boost::system::error_code &ec,
   // Open the acceptor with the option to reuse the address (i.e.
   // SO_REUSEADDR).
   tcp::resolver resolver(io_service_pool_.get_io_service());
+
   tcp::resolver::query query(address, port);
   auto it = resolver.resolve(query, ec);
+  if (ec) {
+    tcp::resolver::query query(address, port, boost::asio::ip::resolver_query_base::numeric_host);
+    auto it = resolver.resolve(query, ec);
+  }
   if (ec) {
     return ec;
   }

sgsx3 avatar Oct 26 '24 12:10 sgsx3

Nice catch! Do you think you could try and get this patch in upstream so we can use a backport of that in Infix? A bit unsure of the state of that upstream though, do you know more @mattiaswal?

troglobit avatar Oct 26 '24 12:10 troglobit

Bumping back to CCB

troglobit avatar Feb 15 '25 07:02 troglobit

CCB: Retest with buildroot 25.02.

jovatn avatar Feb 17 '25 09:02 jovatn

Maybe this is kernelconfig error or somerhing? Does not exist on any other plattorm. Movimg r2s to standard aarch64 image may solve this.

mattiaswal avatar Jul 30 '25 05:07 mattiaswal

Maybe this is kernelconfig error or somerhing? Does not exist on any other plattorm. Movimg r2s to standard aarch64 image may solve this.

Good point, and with the new board package framework this would be quite simple. I can have a look after the weekend 👍

troglobit avatar Jul 30 '25 05:07 troglobit

With R2S now merged to the default Aarch64 build, I'm picking this one up again to see if I can reproduce it. The likely root cause for it was differing kernel configs, which should not be a problem anymore.

troglobit avatar Nov 10 '25 13:11 troglobit

Tested with v25.10.0-175-g245f31cd and Rousette starts without any ethernet connections.

minexn avatar Dec 01 '25 20:12 minexn

Tested with v25.10.0-175-g245f31cd and Rousette starts without any ethernet connections.

Aaah thanks for testing! It's been number two or three on my TODO list for a while now but have not gotten around to testing it.

troglobit avatar Dec 01 '25 20:12 troglobit