core icon indicating copy to clipboard operation
core copied to clipboard

interfaces: improve rc.newwanip(v6) resilience

Open fichtner opened this issue 2 years ago • 11 comments

Important notices

Before you add a new report, we ask you kindly to acknowledge the following:

  • [x] I have read the contributing guide lines at https://github.com/opnsense/core/blob/master/CONTRIBUTING.md
  • [x] I am convinced that my issue is new after having checked both open and closed issues at https://github.com/opnsense/core/issues?q=is%3Aissue

Is your feature request related to a problem? Please describe.

As currently implemented rc.newwanip(v6) heavily disrupts the running system despite the attempt to reduce workload by checking if the IP changes. The caching has been improved a lot over the last year(s) but it does not prevent destructive interface manipulation such as GRE/GIF/bridge/6to4/6rd etc.

Describe the solution you like

Extend the caching of the current IP address to prevent most of if not all the work being done by the scripts.

Describe alternatives you considered

Leaving as is? ;)

Additional context

https://forum.opnsense.org/index.php?topic=29698.0 https://forum.opnsense.org/index.php?topic=29605.0 https://forum.opnsense.org/index.php?topic=29556.0 etc.

fichtner avatar Aug 05 '22 09:08 fichtner

@maurice-w the latest changes in master are promising for IPv6:

[periodic work from rtsold_resolvconf.sh]
2022-08-12T07:46:12+02:00 /usr/local/etc/rc.newwanipv6: No IP change detected for WAN[wan]
2022-08-12T07:55:31+02:00 /usr/local/etc/rc.newwanipv6: No IP change detected for WAN[wan]
2022-08-12T08:01:12+02:00 /usr/local/etc/rc.newwanipv6: No IP change detected for WAN[wan]
2022-08-12T08:09:44+02:00 /usr/local/etc/rc.newwanipv6: No IP change detected for WAN[wan]
2022-08-12T08:16:12+02:00 /usr/local/etc/rc.newwanipv6: No IP change detected for WAN[wan]
2022-08-12T08:25:58+02:00 /usr/local/etc/rc.newwanipv6: No IP change detected for WAN[wan]
2022-08-12T08:31:12+02:00 /usr/local/etc/rc.newwanipv6: No IP change detected for WAN[wan]
2022-08-12T08:39:58+02:00 /usr/local/etc/rc.newwanipv6: No IP change detected for WAN[wan]
2022-08-12T08:46:12+02:00 /usr/local/etc/rc.newwanipv6: No IP change detected for WAN[wan]
2022-08-12T08:54:10+02:00 /usr/local/etc/rc.newwanipv6: No IP change detected for WAN[wan]
2022-08-12T09:01:12+02:00 /usr/local/etc/rc.newwanipv6: No IP change detected for WAN[wan]
[issued rc.reload_all manually] 
2022-08-12T09:05:11+02:00 /usr/local/etc/rc.newwanipv6: Failed to detect IP for WAN[wan]
[rc.newwanipv6 cleared cached IP -- that is a new mechanic]
2022-08-12T09:05:12+02:00 /usr/local/etc/rc.newwanipv6: Failed to detect IP for WAN[wan]
2022-08-12T09:05:12+02:00 /usr/local/etc/rc.newwanipv6: Failed to detect IP for WAN[wan]
2022-08-12T09:05:13+02:00 /usr/local/etc/rc.newwanipv6: On (IP address: 2003:XXXX vs. ) (interface: WAN[wan]) (real interface: igb1).
[rc.newwanipv6 picked up new address]

If you can give this a try in your test lab as well that would be helpful. The same change should be in IPv4 doe 23.1 although that is probably a bit more intrusive for several reasons. But we still have a lot of time. ;)

Cheers, Franco

fichtner avatar Aug 12 '22 07:08 fichtner

@fichtner, I applied opnsense-patch d9609ec 6043b5b on top of 23.1.a_59 on a VM with a SLAAC WAN interface.

When the first (solicited) RA is received, the interface autoconfigures its address properly, but rc.newwanipv6 fails to detect the address. Maybe rc.newwanipv6 gets invoked too early?

2022-08-13T12:39:25 /usr/local/etc/rc.newwanipv6: Failed to detect IP for WAN[wan]

When the second (unsolicited) RA is received, a "new" interface address is detected (which is not actually new):

2022-08-13T12:40:35 /usr/local/etc/rc.newwanipv6: On (IP address: 2a02:3038:412:e7f9:215:5dff:fed2:761e vs. ) (interface: WAN[wan]) (real interface: hn1).
2022-08-13T12:40:37 /usr/local/etc/rc.newwanipv6: ROUTING: entering configure using 'wan'
2022-08-13T12:40:37 /usr/local/etc/rc.newwanipv6: ROUTING: IPv6 default gateway set to wan
2022-08-13T12:40:37 /usr/local/etc/rc.newwanipv6: ROUTING: setting IPv6 default route to fe80::588e:85ff:fe79:9ced
2022-08-13T12:40:37 /usr/local/etc/rc.newwanipv6: ROUTING: removing /tmp/hn1_defaultgwv6
2022-08-13T12:40:37 /usr/local/etc/rc.newwanipv6: ROUTING: creating /tmp/hn1_defaultgwv6 using 'fe80::588e:85ff:fe79:9ced%hn1'
2022-08-13T12:40:37 /usr/local/etc/rc.newwanipv6: ROUTING: keeping current default gateway 'fe80::588e:85ff:fe79:9ced%hn1'
2022-08-13T12:40:37 /usr/local/etc/rc.newwanipv6: The WAN_SLAAC monitor address is empty, skipping.
2022-08-13T12:40:37 /usr/local/etc/rc.newwanipv6: ROUTING: removing /tmp/hn1_defaultgwv6
2022-08-13T12:40:37 /usr/local/etc/rc.newwanipv6: ROUTING: creating /tmp/hn1_defaultgwv6 using 'fe80::588e:85ff:fe79:9ced%hn1'
2022-08-13T12:40:37 /usr/local/etc/rc.newwanipv6: ROUTING: keeping current default gateway 'fe80::588e:85ff:fe79:9ced%hn1'

All subsequent (unsolicited) RAs are handled properly. Nice!

2022-08-13T12:49:35 /usr/local/etc/rc.newwanipv6: No IP change detected for WAN[wan]
2022-08-13T12:56:09 /usr/local/etc/rc.newwanipv6: No IP change detected for WAN[wan]	
2022-08-13T13:01:28 /usr/local/etc/rc.newwanipv6: No IP change detected for WAN[wan]
2022-08-13T13:09:59 /usr/local/etc/rc.newwanipv6: No IP change detected for WAN[wan]

Cheers Maurice

maurice-w avatar Aug 13 '22 11:08 maurice-w

@maurice-w Promising, thanks a lot for testing. Could it be the address is still “tentative” which we ignore for service bind related reasons. Turning off DAD or excluding the tentative check in interfaces_primary_address6() should let us know if that is the case.

fichtner avatar Aug 13 '22 12:08 fichtner

@fichtner You nailed it! It works when setting net.inet6.ip6.dad_count=0. How to solve this properly? Might be best to wait until DAD is completed.

maurice-w avatar Aug 13 '22 14:08 maurice-w

@maurice-w might this also be an issue for https://github.com/opnsense/core/issues/5946 ?

We have no means to listen for tentative removal so it needs to be worked around somehow. Either by polling or a fixed delay or just using the tentative one here... none seem to be pretty nice. I need to think about it.

fichtner avatar Aug 15 '22 07:08 fichtner

@fichtner I don't think #5946 is caused by a tentative interface address. The main issue there is that the _routerv6 file doesn't get created, which should happen before rc.newwanipv6 is invoked. I'm pretty sure rtsold doesn't properly bind to IPv6-only interfaces after a reboot for some reason.

A delay would be best. If I understand RFC 4862 correctly, DAD is completed DupAddrDetectTransmits * RetransTimer milliseconds after sending the initial Neighbor Solicitation. The defaults for these values are 1 * 1000 ms. They can be changed by tunables or Router Advertisements, but simply waiting 1 second before we invoke rc.newwanipv6 here might be okay: https://github.com/opnsense/core/blob/c9bdc3d16245c89f04072e7c3cafb178746634f7/src/opnsense/scripts/interfaces/rtsold_resolvconf.sh#L73

maurice-w avatar Aug 15 '22 22:08 maurice-w

@maurice-w making the sleep local to rtsold script is a great idea... 7627802 is how FreeBSD deals with this.

fichtner avatar Aug 16 '22 08:08 fichtner

@fichtner Ah, interesting. Assume a static RetransTimer of 1000 ms, use the actually configured number of Neighbor Solicitations and add 1 second just to make sure. Seems pragmatic. I'll test this, but might take a while since I dump & spin up my test VMs frequently and have lost track of all the patches again... 😳

maurice-w avatar Aug 16 '22 09:08 maurice-w

most of patches should be in master, no?

fichtner avatar Aug 16 '22 09:08 fichtner

@fichtner You're right, I forgot about opnsense-code. Now on 23.1.a_101 and 7627802 did the trick! We're getting there. :-)

One more slight issue:

/usr/local/etc/rc.newwanipv6: The command '/sbin/route add -host -'inet6' '2001:db8:abc::1' 'fe80::215:5dff:fed2:761d%hn1:slaac'' returned exit code '71', the output was 'route: fe80::215:5dff:fed2:761d%hn1:slaac: Name does not resolve'

2001:db8:abc::1 is the nameserver from /tmp/hn1:slaac_nameserverv6 (advertised by the upstream router). Need to get rid of the :slaac suffix when adding this route.

[edit] "Allow DNS server list to be overridden by DHCP/PPP on WAN" was enabled in this test. [/edit]

Cheers Maurice

maurice-w avatar Aug 16 '22 23:08 maurice-w

@maurice-w nice catch, 29e6e12d7c6a deals with that (it's the only function where this is currently an issue since it reads the flat files given by ifctl)

fichtner avatar Aug 17 '22 05:08 fichtner

I suppose this is taken care of now.

fichtner avatar Nov 06 '22 09:11 fichtner