vyos-1x icon indicating copy to clipboard operation
vyos-1x copied to clipboard

T5872: ipsec remote access VPN: support dhcp-interface.

Open lucasec opened this issue 1 year ago • 12 comments

Change Summary

Enables the dhcp-interface option to be used with ipsec remote-access VPNs.

Types of changes

  • [x] Bug fix (non-breaking change which fixes an issue)
  • [x] New feature (non-breaking change which adds functionality)
  • [ ] Code style update (formatting, renaming)
  • [ ] Refactoring (no functional changes)
  • [ ] Migration from an old Vyatta component to vyos-1x, please link to related PR inside obsoleted component
  • [ ] Other (please describe):

Related Task(s)

  • https://vyos.dev/T5872

Related PR(s)

Component(s) name

ipsec remote-access

Proposed changes

This adds the ability to use dhcp-interface with the ipsec remote-access VPN. It works essentially the same way as the dhcp-interface behavior for the ipsec site-to-site VPN.

While most IKEv2 Remote Access ("Road Warrior") VPN setups will likely be at sites with a fixed IP, the DHCP functionality may be useful for isolated installs, say, at a small business or remote branch office that may be on an ISP package that only offers DHCP addressing. Combined with dynamic DNS, VyOS can still provide a one-stop routing and remote access solution.

How to test

Example configuration:

[edit vpn ipsec remote-access connection ClientVPN]
lucas@lcn-router# show
 authentication {
     client-mode eap-tls
     local-id <local id>
     server-mode x509
     x509 {
         ca-certificate <ca certificate name>
         certificate <server certificate name>
     }
 }
 dhcp-interface eth0
 ...

The ipsec daemon should start normally and accept connections. If you inspect /etc/swanctl/swanctl.conf on the running router, you should see a peer configuration that begins with the following (replace 1.2.3.4 with the router's WAN interface IP on eth0):

    ra-ClientVPN {
        remote_addrs = %any
        local_addrs = 1.2.3.4 # dhcp:eth1

You can simulate the DHCP address change behavior by manually invoking the dhclient exit hook:

sudo interface=eth1 old_ip_address=1.2.3.4 new_ip_address=5.6.7.8 /etc/dhcp/dhclient-exit-hooks.d/99-ipsec-dhclient-hook

Smoketest result

DEBUG - Running Testcase: /usr/libexec/vyos/tests/smoke/cli/test_vpn_ipsec.py
DEBUG - test_01_dhcp_fail_handling (__main__.TestVPNIPsec.test_01_dhcp_fail_handling) ... ok
DEBUG - test_02_site_to_site (__main__.TestVPNIPsec.test_02_site_to_site) ... ok
DEBUG - test_03_site_to_site_vti (__main__.TestVPNIPsec.test_03_site_to_site_vti) ... ok
DEBUG - test_04_dmvpn (__main__.TestVPNIPsec.test_04_dmvpn) ... ok
DEBUG - test_05_x509_site2site (__main__.TestVPNIPsec.test_05_x509_site2site) ... ok
DEBUG - test_06_flex_vpn_vips (__main__.TestVPNIPsec.test_06_flex_vpn_vips) ... ok
DEBUG - test_07_ikev2_road_warrior (__main__.TestVPNIPsec.test_07_ikev2_road_warrior) ... ok
DEBUG - test_08_ikev2_road_warrior_client_auth_eap_tls (__main__.TestVPNIPsec.test_08_ikev2_road_warrior_client_auth_eap_tls) ... ok
DEBUG - test_09_ikev2_road_warrior_client_auth_x509 (__main__.TestVPNIPsec.test_09_ikev2_road_warrior_client_auth_x509) ... ok
DEBUG - test_10_ikev2_road_warrior_dhcp_fail_handling (__main__.TestVPNIPsec.test_10_ikev2_road_warrior_dhcp_fail_handling) ... ok

Checklist:

  • [x] I have read the CONTRIBUTING document
  • [x] I have linked this PR to one or more Phabricator Task(s)
  • [x] I have run the components SMOKETESTS if applicable
  • [x] My commit headlines contain a valid Task id
  • [x] My change requires a change to the documentation
  • [ ] I have updated the documentation accordingly

lucasec avatar Feb 08 '24 06:02 lucasec

What happens on dhcp renew when you receive a NEW IP address?

Can you test this please?

c-po avatar Feb 08 '24 21:02 c-po

It's like you know where all the skeletons are buried ;)

I did run this on my lab system, and it seemed to persist an IP change, but after poking around with the hook script a bit more directly I must have just gotten lucky that some other change I was experimenting with caused the swanctl config to re-render.

That script is definitely quite broken. It appears it may even be broken all the way back to 1.3 too (due to ipsec statusall command outputting peer- instead of peer_, at least on 1.3.5 where I tested).

I pushed a fix here that should be valid for 1.4 and 1.5 under the current ticket (T5872). If you have another one open for this issue already let me know and I can re-word the second commit.

lucasec avatar Feb 09 '24 07:02 lucasec

Thanks for the proposed change in src/etc/dhcp/dhclient-exit-hooks.d/99-ipsec-dhclient-hook

I think it's not a good idea to mangle the service definition files underneath - @sarthurdev please correct me.

I would rather see if the dhclient lease changed, and if so - call /usr/libexec/vyos/conf_mode/vpn_ipsec.py so the entire cofig is regenerated from a single point to not leaf space for any arbitrary errors in scripts.

c-po avatar Feb 09 '24 07:02 c-po

I agree the mangling feels like a bit of a hack left over from the 1.2 days. It would also remove the need for error-prone regular expressions (including the new "hack" I added to put the connection name on the same line as the DHCP interface comment to make it easy to find the right connection name to reset).

Would there be any race conditions if someone was mutating the config in config-mode at the time the hook executes?

edit to add: we already invoke the conf_mode script in the case of when the DHCP IP is not available at configure time, if we can do this for IP changes can also remove the need for the /tmp/ipsec_dhcp_waiting file.

lucasec avatar Feb 09 '24 07:02 lucasec

You can safe from such races by checking if a commit is in progress

https://github.com/vyos/vyos-1x/blob/current/python/vyos/utils/commit.py

c-po avatar Feb 09 '24 08:02 c-po

I think it's not a good idea to mangle the service definition files underneath - @sarthurdev please correct me.

Altering the generated files from the hook would likely be more performant than calling the conf script again, but does add complexity (linking interface/connection via comments in-file).

I agree that simplifying the hook to just call the conf script is likely an overall safer solution (with the commit in progress check).

sarthurdev avatar Feb 09 '24 09:02 sarthurdev

The exit hook I pushed that does the mangling is proving to still be pretty broken, so added a few more fixes. In particular, the logic that tires to clear old connections/SAs is proving to be quite dangerous. If the IP actually changes, a new connection will come up after clearing the connection and reloading swanctl. But if the IP does not change, swanctl recognizes that the configuration is still the same when it is asked to reload so the connection stays in the "terminated" state.

It's not really clear to me that logic is still needed. At least for site-to-site peers, swanctl seems to reliably clear old SAs when it reloads and the configuration is changed (at least when using connection-type: initiate). For road warrior, existing child SAs stay up until they time out (unless the client can actually re-resolve and re-negotiate after the IP change, I'm still trying to test if any real-world clients have that behavior).

lucasec avatar Feb 10 '24 19:02 lucasec

To be honest - IKEv2 road warriors terminating at a dynamic endpoint with changing DHCP IPs and a ddns record needing an update can handle a bit of downtime, as also the DNS update requires some time.

IMHO this should not be overengineered.

Such a setup is nice for a lab but broken by design for production/enterprise use

c-po avatar Feb 11 '24 11:02 c-po

@c-po Thanks for your patience while time to focus on this has been hard to come by over here. I just pushed up an update to the hook that dramatically simplifies it. Do take a look at my attempt at race condition handling though—I don't think it's 100% robust but hopefully better than nothing.

I deployed this on my system and am letting it soak for a bit. In general swanctl's reload mechanism seems to work fine for site-to-site connections. If it detects a change it will tear down the old connection and re-create. For road-warrior configurations, it does not seem to tear down the old connections. Like you said though, that probably doesn't need to be over-engineered.

lucasec avatar Mar 10 '24 22:03 lucasec

ignore my paranoia famous last words ;)

That being said, I think we've shaken out all the issues here. Also going to add the "bug fix" checkbox in the description since that hook re-write also fixes the functionality for site-to-site users, which I guess none of us (myself included) realized was broken before...

lucasec avatar Mar 18 '24 06:03 lucasec

@c-po Is this clear to merge or are we waiting on any further input? Also, thoughts on backporting to 1.4?

lucasec avatar Mar 21 '24 05:03 lucasec

@c-po Is this clear to merge or are we waiting on any further input? Also, thoughts on backporting to 1.4?

I‘m waiting on a review from @sarthurdev, after that is done we can backport this of course

c-po avatar Mar 21 '24 06:03 c-po

While we wait for review on this, may be worth taking a look at https://vyos.dev/T6177 which I just filed. I am going to try temporarily changing the DHCP hook to a no-op to see if that increases the time between crashes (I do not believe they are related as the hook only reloads strongswan if the IP actually changes, which I do not believe has happened). Currently the issue seems to reproduce roughly once every 7 days.

lucasec avatar Mar 27 '24 16:03 lucasec

@mergifyio backport sagitta

dmbaturin avatar Mar 28 '24 16:03 dmbaturin

backport sagitta

✅ Backports have been created

mergify[bot] avatar Mar 28 '24 16:03 mergify[bot]

I am going to try temporarily changing the DHCP hook to a no-op to see if that increases the time between crashes (I do not believe they are related as the hook only reloads strongswan if the IP actually changes, which I do not believe has happened). Currently the issue seems to reproduce roughly once every 7 days.

Tested this and still experienced a crash consistent with https://vyos.dev/T6177. So that regression was not introduced here. Continuing to investigate and will add further findings to that ticket.

lucasec avatar Mar 31 '24 07:03 lucasec