core
core copied to clipboard
interfaces: improve rc.newwanip(v6) resilience
Important notices
Before you add a new report, we ask you kindly to acknowledge the following:
- [x] I have read the contributing guide lines at https://github.com/opnsense/core/blob/master/CONTRIBUTING.md
- [x] I am convinced that my issue is new after having checked both open and closed issues at https://github.com/opnsense/core/issues?q=is%3Aissue
Is your feature request related to a problem? Please describe.
As currently implemented rc.newwanip(v6) heavily disrupts the running system despite the attempt to reduce workload by checking if the IP changes. The caching has been improved a lot over the last year(s) but it does not prevent destructive interface manipulation such as GRE/GIF/bridge/6to4/6rd etc.
Describe the solution you like
Extend the caching of the current IP address to prevent most of if not all the work being done by the scripts.
Describe alternatives you considered
Leaving as is? ;)
Additional context
https://forum.opnsense.org/index.php?topic=29698.0 https://forum.opnsense.org/index.php?topic=29605.0 https://forum.opnsense.org/index.php?topic=29556.0 etc.
@maurice-w the latest changes in master are promising for IPv6:
[periodic work from rtsold_resolvconf.sh]
2022-08-12T07:46:12+02:00 /usr/local/etc/rc.newwanipv6: No IP change detected for WAN[wan]
2022-08-12T07:55:31+02:00 /usr/local/etc/rc.newwanipv6: No IP change detected for WAN[wan]
2022-08-12T08:01:12+02:00 /usr/local/etc/rc.newwanipv6: No IP change detected for WAN[wan]
2022-08-12T08:09:44+02:00 /usr/local/etc/rc.newwanipv6: No IP change detected for WAN[wan]
2022-08-12T08:16:12+02:00 /usr/local/etc/rc.newwanipv6: No IP change detected for WAN[wan]
2022-08-12T08:25:58+02:00 /usr/local/etc/rc.newwanipv6: No IP change detected for WAN[wan]
2022-08-12T08:31:12+02:00 /usr/local/etc/rc.newwanipv6: No IP change detected for WAN[wan]
2022-08-12T08:39:58+02:00 /usr/local/etc/rc.newwanipv6: No IP change detected for WAN[wan]
2022-08-12T08:46:12+02:00 /usr/local/etc/rc.newwanipv6: No IP change detected for WAN[wan]
2022-08-12T08:54:10+02:00 /usr/local/etc/rc.newwanipv6: No IP change detected for WAN[wan]
2022-08-12T09:01:12+02:00 /usr/local/etc/rc.newwanipv6: No IP change detected for WAN[wan]
[issued rc.reload_all manually]
2022-08-12T09:05:11+02:00 /usr/local/etc/rc.newwanipv6: Failed to detect IP for WAN[wan]
[rc.newwanipv6 cleared cached IP -- that is a new mechanic]
2022-08-12T09:05:12+02:00 /usr/local/etc/rc.newwanipv6: Failed to detect IP for WAN[wan]
2022-08-12T09:05:12+02:00 /usr/local/etc/rc.newwanipv6: Failed to detect IP for WAN[wan]
2022-08-12T09:05:13+02:00 /usr/local/etc/rc.newwanipv6: On (IP address: 2003:XXXX vs. ) (interface: WAN[wan]) (real interface: igb1).
[rc.newwanipv6 picked up new address]
If you can give this a try in your test lab as well that would be helpful. The same change should be in IPv4 doe 23.1 although that is probably a bit more intrusive for several reasons. But we still have a lot of time. ;)
Cheers, Franco
@fichtner, I applied opnsense-patch d9609ec 6043b5b
on top of 23.1.a_59 on a VM with a SLAAC WAN interface.
When the first (solicited) RA is received, the interface autoconfigures its address properly, but rc.newwanipv6 fails to detect the address. Maybe rc.newwanipv6 gets invoked too early?
2022-08-13T12:39:25 /usr/local/etc/rc.newwanipv6: Failed to detect IP for WAN[wan]
When the second (unsolicited) RA is received, a "new" interface address is detected (which is not actually new):
2022-08-13T12:40:35 /usr/local/etc/rc.newwanipv6: On (IP address: 2a02:3038:412:e7f9:215:5dff:fed2:761e vs. ) (interface: WAN[wan]) (real interface: hn1).
2022-08-13T12:40:37 /usr/local/etc/rc.newwanipv6: ROUTING: entering configure using 'wan'
2022-08-13T12:40:37 /usr/local/etc/rc.newwanipv6: ROUTING: IPv6 default gateway set to wan
2022-08-13T12:40:37 /usr/local/etc/rc.newwanipv6: ROUTING: setting IPv6 default route to fe80::588e:85ff:fe79:9ced
2022-08-13T12:40:37 /usr/local/etc/rc.newwanipv6: ROUTING: removing /tmp/hn1_defaultgwv6
2022-08-13T12:40:37 /usr/local/etc/rc.newwanipv6: ROUTING: creating /tmp/hn1_defaultgwv6 using 'fe80::588e:85ff:fe79:9ced%hn1'
2022-08-13T12:40:37 /usr/local/etc/rc.newwanipv6: ROUTING: keeping current default gateway 'fe80::588e:85ff:fe79:9ced%hn1'
2022-08-13T12:40:37 /usr/local/etc/rc.newwanipv6: The WAN_SLAAC monitor address is empty, skipping.
2022-08-13T12:40:37 /usr/local/etc/rc.newwanipv6: ROUTING: removing /tmp/hn1_defaultgwv6
2022-08-13T12:40:37 /usr/local/etc/rc.newwanipv6: ROUTING: creating /tmp/hn1_defaultgwv6 using 'fe80::588e:85ff:fe79:9ced%hn1'
2022-08-13T12:40:37 /usr/local/etc/rc.newwanipv6: ROUTING: keeping current default gateway 'fe80::588e:85ff:fe79:9ced%hn1'
All subsequent (unsolicited) RAs are handled properly. Nice!
2022-08-13T12:49:35 /usr/local/etc/rc.newwanipv6: No IP change detected for WAN[wan]
2022-08-13T12:56:09 /usr/local/etc/rc.newwanipv6: No IP change detected for WAN[wan]
2022-08-13T13:01:28 /usr/local/etc/rc.newwanipv6: No IP change detected for WAN[wan]
2022-08-13T13:09:59 /usr/local/etc/rc.newwanipv6: No IP change detected for WAN[wan]
Cheers Maurice
@maurice-w Promising, thanks a lot for testing. Could it be the address is still “tentative” which we ignore for service bind related reasons. Turning off DAD or excluding the tentative check in interfaces_primary_address6() should let us know if that is the case.
@fichtner You nailed it! It works when setting net.inet6.ip6.dad_count=0
.
How to solve this properly? Might be best to wait until DAD is completed.
@maurice-w might this also be an issue for https://github.com/opnsense/core/issues/5946 ?
We have no means to listen for tentative removal so it needs to be worked around somehow. Either by polling or a fixed delay or just using the tentative one here... none seem to be pretty nice. I need to think about it.
@fichtner I don't think #5946 is caused by a tentative interface address. The main issue there is that the _routerv6 file doesn't get created, which should happen before rc.newwanipv6 is invoked. I'm pretty sure rtsold doesn't properly bind to IPv6-only interfaces after a reboot for some reason.
A delay would be best. If I understand RFC 4862 correctly, DAD is completed DupAddrDetectTransmits * RetransTimer milliseconds after sending the initial Neighbor Solicitation. The defaults for these values are 1 * 1000 ms. They can be changed by tunables or Router Advertisements, but simply waiting 1 second before we invoke rc.newwanipv6 here might be okay: https://github.com/opnsense/core/blob/c9bdc3d16245c89f04072e7c3cafb178746634f7/src/opnsense/scripts/interfaces/rtsold_resolvconf.sh#L73
@maurice-w making the sleep local to rtsold script is a great idea... 7627802 is how FreeBSD deals with this.
@fichtner Ah, interesting. Assume a static RetransTimer of 1000 ms, use the actually configured number of Neighbor Solicitations and add 1 second just to make sure. Seems pragmatic. I'll test this, but might take a while since I dump & spin up my test VMs frequently and have lost track of all the patches again... 😳
most of patches should be in master, no?
@fichtner You're right, I forgot about opnsense-code
. Now on 23.1.a_101 and 7627802 did the trick! We're getting there. :-)
One more slight issue:
/usr/local/etc/rc.newwanipv6: The command '/sbin/route add -host -'inet6' '2001:db8:abc::1' 'fe80::215:5dff:fed2:761d%hn1:slaac'' returned exit code '71', the output was 'route: fe80::215:5dff:fed2:761d%hn1:slaac: Name does not resolve'
2001:db8:abc::1
is the nameserver from /tmp/hn1:slaac_nameserverv6
(advertised by the upstream router). Need to get rid of the :slaac
suffix when adding this route.
[edit] "Allow DNS server list to be overridden by DHCP/PPP on WAN" was enabled in this test. [/edit]
Cheers Maurice
@maurice-w nice catch, 29e6e12d7c6a deals with that (it's the only function where this is currently an issue since it reads the flat files given by ifctl)
I suppose this is taken care of now.