vyos-1x
vyos-1x copied to clipboard
T5872: ipsec remote access VPN: support dhcp-interface.
Change Summary
Enables the dhcp-interface option to be used with ipsec remote-access VPNs.
Types of changes
- [x] Bug fix (non-breaking change which fixes an issue)
- [x] New feature (non-breaking change which adds functionality)
- [ ] Code style update (formatting, renaming)
- [ ] Refactoring (no functional changes)
- [ ] Migration from an old Vyatta component to vyos-1x, please link to related PR inside obsoleted component
- [ ] Other (please describe):
Related Task(s)
- https://vyos.dev/T5872
Related PR(s)
Component(s) name
ipsec remote-access
Proposed changes
This adds the ability to use dhcp-interface with the ipsec remote-access VPN. It works essentially the same way as the dhcp-interface behavior for the ipsec site-to-site VPN.
While most IKEv2 Remote Access ("Road Warrior") VPN setups will likely be at sites with a fixed IP, the DHCP functionality may be useful for isolated installs, say, at a small business or remote branch office that may be on an ISP package that only offers DHCP addressing. Combined with dynamic DNS, VyOS can still provide a one-stop routing and remote access solution.
How to test
Example configuration:
[edit vpn ipsec remote-access connection ClientVPN]
lucas@lcn-router# show
authentication {
client-mode eap-tls
local-id <local id>
server-mode x509
x509 {
ca-certificate <ca certificate name>
certificate <server certificate name>
}
}
dhcp-interface eth0
...
The ipsec daemon should start normally and accept connections.
If you inspect /etc/swanctl/swanctl.conf on the running router, you should see a peer configuration that begins with the following (replace 1.2.3.4 with the router's WAN interface IP on eth0):
ra-ClientVPN {
remote_addrs = %any
local_addrs = 1.2.3.4 # dhcp:eth1
You can simulate the DHCP address change behavior by manually invoking the dhclient exit hook:
sudo interface=eth1 old_ip_address=1.2.3.4 new_ip_address=5.6.7.8 /etc/dhcp/dhclient-exit-hooks.d/99-ipsec-dhclient-hook
Smoketest result
DEBUG - Running Testcase: /usr/libexec/vyos/tests/smoke/cli/test_vpn_ipsec.py
DEBUG - test_01_dhcp_fail_handling (__main__.TestVPNIPsec.test_01_dhcp_fail_handling) ... ok
DEBUG - test_02_site_to_site (__main__.TestVPNIPsec.test_02_site_to_site) ... ok
DEBUG - test_03_site_to_site_vti (__main__.TestVPNIPsec.test_03_site_to_site_vti) ... ok
DEBUG - test_04_dmvpn (__main__.TestVPNIPsec.test_04_dmvpn) ... ok
DEBUG - test_05_x509_site2site (__main__.TestVPNIPsec.test_05_x509_site2site) ... ok
DEBUG - test_06_flex_vpn_vips (__main__.TestVPNIPsec.test_06_flex_vpn_vips) ... ok
DEBUG - test_07_ikev2_road_warrior (__main__.TestVPNIPsec.test_07_ikev2_road_warrior) ... ok
DEBUG - test_08_ikev2_road_warrior_client_auth_eap_tls (__main__.TestVPNIPsec.test_08_ikev2_road_warrior_client_auth_eap_tls) ... ok
DEBUG - test_09_ikev2_road_warrior_client_auth_x509 (__main__.TestVPNIPsec.test_09_ikev2_road_warrior_client_auth_x509) ... ok
DEBUG - test_10_ikev2_road_warrior_dhcp_fail_handling (__main__.TestVPNIPsec.test_10_ikev2_road_warrior_dhcp_fail_handling) ... ok
Checklist:
- [x] I have read the CONTRIBUTING document
- [x] I have linked this PR to one or more Phabricator Task(s)
- [x] I have run the components SMOKETESTS if applicable
- [x] My commit headlines contain a valid Task id
- [x] My change requires a change to the documentation
- [ ] I have updated the documentation accordingly
What happens on dhcp renew when you receive a NEW IP address?
Can you test this please?
It's like you know where all the skeletons are buried ;)
I did run this on my lab system, and it seemed to persist an IP change, but after poking around with the hook script a bit more directly I must have just gotten lucky that some other change I was experimenting with caused the swanctl config to re-render.
That script is definitely quite broken. It appears it may even be broken all the way back to 1.3 too (due to ipsec statusall command outputting peer- instead of peer_, at least on 1.3.5 where I tested).
I pushed a fix here that should be valid for 1.4 and 1.5 under the current ticket (T5872). If you have another one open for this issue already let me know and I can re-word the second commit.
Thanks for the proposed change in src/etc/dhcp/dhclient-exit-hooks.d/99-ipsec-dhclient-hook
I think it's not a good idea to mangle the service definition files underneath - @sarthurdev please correct me.
I would rather see if the dhclient lease changed, and if so - call /usr/libexec/vyos/conf_mode/vpn_ipsec.py so the entire cofig is regenerated from a single point to not leaf space for any arbitrary errors in scripts.
I agree the mangling feels like a bit of a hack left over from the 1.2 days. It would also remove the need for error-prone regular expressions (including the new "hack" I added to put the connection name on the same line as the DHCP interface comment to make it easy to find the right connection name to reset).
Would there be any race conditions if someone was mutating the config in config-mode at the time the hook executes?
edit to add: we already invoke the conf_mode script in the case of when the DHCP IP is not available at configure time, if we can do this for IP changes can also remove the need for the /tmp/ipsec_dhcp_waiting file.
You can safe from such races by checking if a commit is in progress
https://github.com/vyos/vyos-1x/blob/current/python/vyos/utils/commit.py
I think it's not a good idea to mangle the service definition files underneath - @sarthurdev please correct me.
Altering the generated files from the hook would likely be more performant than calling the conf script again, but does add complexity (linking interface/connection via comments in-file).
I agree that simplifying the hook to just call the conf script is likely an overall safer solution (with the commit in progress check).
The exit hook I pushed that does the mangling is proving to still be pretty broken, so added a few more fixes. In particular, the logic that tires to clear old connections/SAs is proving to be quite dangerous. If the IP actually changes, a new connection will come up after clearing the connection and reloading swanctl. But if the IP does not change, swanctl recognizes that the configuration is still the same when it is asked to reload so the connection stays in the "terminated" state.
It's not really clear to me that logic is still needed. At least for site-to-site peers, swanctl seems to reliably clear old SAs when it reloads and the configuration is changed (at least when using connection-type: initiate). For road warrior, existing child SAs stay up until they time out (unless the client can actually re-resolve and re-negotiate after the IP change, I'm still trying to test if any real-world clients have that behavior).
To be honest - IKEv2 road warriors terminating at a dynamic endpoint with changing DHCP IPs and a ddns record needing an update can handle a bit of downtime, as also the DNS update requires some time.
IMHO this should not be overengineered.
Such a setup is nice for a lab but broken by design for production/enterprise use
@c-po Thanks for your patience while time to focus on this has been hard to come by over here. I just pushed up an update to the hook that dramatically simplifies it. Do take a look at my attempt at race condition handling though—I don't think it's 100% robust but hopefully better than nothing.
I deployed this on my system and am letting it soak for a bit. In general swanctl's reload mechanism seems to work fine for site-to-site connections. If it detects a change it will tear down the old connection and re-create. For road-warrior configurations, it does not seem to tear down the old connections. Like you said though, that probably doesn't need to be over-engineered.
ignore my paranoia
famous last words ;)
That being said, I think we've shaken out all the issues here. Also going to add the "bug fix" checkbox in the description since that hook re-write also fixes the functionality for site-to-site users, which I guess none of us (myself included) realized was broken before...
@c-po Is this clear to merge or are we waiting on any further input? Also, thoughts on backporting to 1.4?
@c-po Is this clear to merge or are we waiting on any further input? Also, thoughts on backporting to 1.4?
I‘m waiting on a review from @sarthurdev, after that is done we can backport this of course
While we wait for review on this, may be worth taking a look at https://vyos.dev/T6177 which I just filed. I am going to try temporarily changing the DHCP hook to a no-op to see if that increases the time between crashes (I do not believe they are related as the hook only reloads strongswan if the IP actually changes, which I do not believe has happened). Currently the issue seems to reproduce roughly once every 7 days.
@mergifyio backport sagitta
backport sagitta
✅ Backports have been created
- #3204 T5872: ipsec remote access VPN: support dhcp-interface. (backport #2965) has been created for branch
sagitta
I am going to try temporarily changing the DHCP hook to a no-op to see if that increases the time between crashes (I do not believe they are related as the hook only reloads strongswan if the IP actually changes, which I do not believe has happened). Currently the issue seems to reproduce roughly once every 7 days.
Tested this and still experienced a crash consistent with https://vyos.dev/T6177. So that regression was not introduced here. Continuing to investigate and will add further findings to that ticket.