vyos-1x icon indicating copy to clipboard operation
vyos-1x copied to clipboard

T5873: ipsec remote access VPN: support VTI interfaces.

Open lucasec opened this issue 1 year ago • 20 comments

Change Summary

Route-based VPNs can be more convenient to configure and tie in nicely with existing routing protocols, zone-based firewalls, and other common network configurations. OpenVPN users are already quite familiar with this pattern. This PR extends the IPsec (IKEv2) Remote Access VPN to support "virtual tunnel interfaces" enabling similar usage patterns.

Types of changes

  • [ ] Bug fix (non-breaking change which fixes an issue)
  • [x] New feature (non-breaking change which adds functionality)
  • [ ] Code style update (formatting, renaming)
  • [ ] Refactoring (no functional changes)
  • [ ] Migration from an old Vyatta component to vyos-1x, please link to related PR inside obsoleted component
  • [ ] Other (please describe):

Related Task(s)

https://vyos.dev/T5873

Related PR(s)

Component(s) name

ipsec remote-access

Proposed changes

This PR includes two key changes to enable VTI-based remote access VPNs:

  1. The remote-access pool configuration block has been extended to accept a range block, as an alternative to the current CIDR prefix attribute. This allows defining a more granular range for assigning VPN client IPs, which is helpful if you want to reserve one or more IPs at the start of a CIDR block for the router itself on the VTI interface.
  2. The remote-access connection accepts a new bind attribute. This works identically to the peer <peer> vti bind attribute (for site-to-site peers you either define one or more tunnels, or a vti block—there is no equivalent of tunnel for remote-access connections hence the decision not to nest it under vti here). Once defined, all traffic to/from connected peers uses the specified VTI interface, as opposed to being routed by kernel policies. This change is enabled by the internal change in VyOS 1.4 that switched from the legacy vti interface type to the newer xfrm interface type, which happily multiple tunnels with different local/remote traffic selectors, e.g. for each connected VPN client.

How to test

Example configuration:

 interfaces {
     ethernet eth0 {
         ...
     }
     vti vti1 {
         address 10.23.58.1/24
         address fdcc:2200:a8ee:2358::1/64
         description "Client VPN"
         mtu 1436
     }
 }
 vpn {
     ipsec {
         esp-group ClientVPN-Client {
             lifetime 3600
             pfs enable
             proposal 1 {
                 encryption aes256gcm128
                 hash sha256
             }
         }
         ike-group ClientVPN-Client {
             key-exchange ikev2
             lifetime 7200
             proposal 1 {
                 dh-group 21
                 encryption aes256gcm128
                 hash sha256
             }
         }
         options {
             disable-route-autoinstall
         }
         remote-access {
             connection ClientVPN {
                 authentication {
                     client-mode x509
                     local-id <local id>
                     server-mode x509
                     x509 {
                         ca-certificate <ca cert name>
                         certificate <cert name>
                     }
                 }
                 bind vti1
                 dhcp-interface eth0
                 esp-group ClientVPN-Client
                 ike-group ClientVPN-Client
                 pool Client-Pool-v4
                 pool Client-Pool-v6
             }
             pool Client-Pool-v4 {
                 name-server 10.23.58.1
                 range {
                     start 10.23.58.2
                     stop 10.23.58.254
                 }
             }
             pool Client-Pool-v6 {
                 name-server fdcc:2200:a8ee:2358::1
                 range {
                     start fdcc:2200:a8ee:2358::2
                     stop fdcc:2200:a8ee:2358::ffff
                 }
             }
         }
     }
 }

Smoketest result

 INFO - Executing VyOS smoketests
DEBUG - vyos@vyos:~$ /usr/bin/vyos-smoketest
DEBUG - /usr/bin/vyos-smoketest
DEBUG - Running Testcase: /usr/libexec/vyos/tests/smoke/cli/test_vpn_ipsec.py
DEBUG - test_dhcp_fail_handling (__main__.TestVPNIPsec.test_dhcp_fail_handling) ... ok
DEBUG - test_dmvpn (__main__.TestVPNIPsec.test_dmvpn) ... ok
DEBUG - test_flex_vpn_vips (__main__.TestVPNIPsec.test_flex_vpn_vips) ... ok
DEBUG - test_remote_access (__main__.TestVPNIPsec.test_remote_access) ... ok
DEBUG - test_remote_access_dhcp_fail_handling (__main__.TestVPNIPsec.test_remote_access_dhcp_fail_handling) ... ok
DEBUG - test_remote_access_eap_tls (__main__.TestVPNIPsec.test_remote_access_eap_tls) ... ok
DEBUG - test_remote_access_pool_range (__main__.TestVPNIPsec.test_remote_access_pool_range) ... ok
DEBUG - test_remote_access_vti (__main__.TestVPNIPsec.test_remote_access_vti) ... ok
DEBUG - test_remote_access_x509 (__main__.TestVPNIPsec.test_remote_access_x509) ... ok
DEBUG - test_site_to_site (__main__.TestVPNIPsec.test_site_to_site) ... ok
DEBUG - test_site_to_site_vti (__main__.TestVPNIPsec.test_site_to_site_vti) ... ok
DEBUG - test_site_to_site_x509 (__main__.TestVPNIPsec.test_site_to_site_x509) ... ok

Checklist:

  • [x] I have read the CONTRIBUTING document
  • [x] I have linked this PR to one or more Phabricator Task(s)
  • [x] I have run the components SMOKETESTS if applicable
  • [x] My commit headlines contain a valid Task id
  • [x] My change requires a change to the documentation
  • [ ] I have updated the documentation accordingly

lucasec avatar Apr 01 '24 00:04 lucasec

Changing this to a draft as it appears the up/down script may need some work for this to work properly. Seems to have been lost in the rebase.

lucasec avatar Apr 01 '24 02:04 lucasec

@lucasec is the fix in #3302 in any way related to the issue you mention?

GurliGebis avatar Apr 12 '24 20:04 GurliGebis

It’s not related to this PR. I also don’t think that fix is related to the instability I’ve been seeing in https://vyos.dev/T6177. I am planning to resume work on this shortly while I continue to debug that issue.

lucasec avatar Apr 12 '24 20:04 lucasec

Sounds great, thank you 🙂

GurliGebis avatar Apr 13 '24 04:04 GurliGebis

Very interesting approach. But why I need also pools if now ecerything cones from the VTI? Is it b/c of IP address assignment reasons (fake dhcp?)

c-po avatar Apr 13 '24 04:04 c-po

Yeah, the IP addresses are still assigned to clients via the IKE protocol, so the pool configuration is needed to tell strongswan what range to create client IPs from.

Getting the up/down logic for the interface right is a little interesting. I would assume if a remote access is bound to the VTI, the VTI interface should be up all the time.

Cleanest implementation is probably to have set_admin_state (here: https://github.com/vyos/vyos-1x/blob/9eb018c4935235d292d7c693ac15da5761be064a/python/vyos/ifconfig/vti.py#L59) check for dependencies in the ipsec config, then only no-op if the interface is unbound or bound to a single site-to-site config (so for remote access, the interface would come up immediately via the normal interface up/down code).

Btw the logic for T6085 for site-to-site configs (https://github.com/vyos/vyos-1x/commit/9eb018c4935235d292d7c693ac15da5761be064a) has some potentially unintended behavior that disabling/re-enabling a VTI interface in the config does not immediately take effect—the interface state will only update after the ipsec connection gets torn down or ipsec is restarted.

lucasec avatar Apr 15 '24 08:04 lucasec

Any updates?

sever-sever avatar May 14 '24 17:05 sever-sever

I think I set this aside awaiting further feedback on the approach for the up/down script I shared in this comment: https://github.com/vyos/vyos-1x/pull/3221#issuecomment-2056106543.

Let me re-visit later this week and try to put together an implementation—that may be a faster way to move this forward.

lucasec avatar May 14 '24 18:05 lucasec

This pull request has conflicts, please resolve those before we can evaluate the pull request.

github-actions[bot] avatar May 21 '24 10:05 github-actions[bot]

This might be off topic, but how come there all of a sudden is a conflict, with no new commits?

GurliGebis avatar May 23 '24 06:05 GurliGebis

This might be off topic, but how come there all of a sudden is a conflict, with no new commits?

The action was not run automatically for quiet some time

c-po avatar May 23 '24 07:05 c-po

Okay, that explains it 🙂

GurliGebis avatar May 23 '24 08:05 GurliGebis

Hi all, I haven't forgotten about this PR, but my work schedule has been rather unforgiving lately. Thanks for your patience and I'll hopefully have some improvements soon to get this closer to merging.

lucasec avatar May 30 '24 18:05 lucasec

I am about half way through an implementation for the vti-up-down hook and just wanted to write out my approach in case anyone has early feedback on it.

There are three issues with the current vti-up-down hook:

  1. The hook does not work with ipsec remote-access configurations, since a single configuration can map to potentially many connections (e.g. one for each client). The current hook brings the entire interface up every time any connection connects, and down every time any connection disconnects.
  2. While technically a special case of 1, for the same reason as 1 the hook does not allow multiple ipsec connections to share a single VTI interface (see https://vyos.dev/T5874 for some motivation why we might want this).
  3. Also slightly out of scope of T5873, though I intend to address at the same time: setting or clearing the disable attribute on a VTI interface does not immediately take effect, as set_admin_state in the interface code is no-oped. The only way currently to disable a VTI interface is to configure it as disabled then restart the ipsec daemon.

My plan to introduce a new database file /tmp/ipsec_vti_interfaces that will store a space-separated list of entries. The vti-up-down hook will write entries to this DB of the following format: interface:connection-name:protocol, where interface is the VTI interface to bring up, connection-name is the ipsec connection calling for the interface to be brought up, and protocol is v4 for IPv4 or v6 for IPv6. The config logic can write an additional entry of the format interface if the VTI interface is called to always be up (in the case of remote-access connections).

For remote-access connections, the config logic will also bring the interface up at apply time if it is not already. The VTI class's set_admin_state method will also re-gain its ability to bring interfaces up and down, but before bringing an interface up it will check the DB file to see if a connection is calling for it to be up.

Obviously this has turned into a bigger change than I was expecting, but hopefully once I work the last few kinks out it will buy us a more robust foundation for future ipsec/VTI work.

lucasec avatar Jun 24 '24 06:06 lucasec

@lucasec is there a way the database can end up inconsistent? (Like a connection not getting removed if something crashes)

GurliGebis avatar Jun 24 '24 19:06 GurliGebis

Just got an idea - not sure if it would work - but what if the "down" parsed the output from swanctl --list-conns - wouldn't it be possible to detect if any more connections exist, and then only down the interface if the last one is gone?

GurliGebis avatar Jun 25 '24 08:06 GurliGebis

Conflicts have been resolved. A maintainer will review the pull request shortly.

github-actions[bot] avatar Jul 04 '24 06:07 github-actions[bot]

👍 No issues in PR Title / Commit Title

github-actions[bot] avatar Jul 04 '24 06:07 github-actions[bot]

Adding a first draft of an implementation in the vain of https://github.com/vyos/vyos-1x/pull/3221#issuecomment-2185714342.

This is still pending even the most basic testing, will update once that is complete.

In general I think the approach turned out somewhat clean, although it still needs a little bit of hardening for race conditions between the configuration code and the vti-up-down hook (possibly using some sort of file-based locking mechanism, need to see if there is any prior art here).

@GurliGebis The reason for avoiding swanctl --list-conns as I'm not aware of strongswan giving any guarantees as to whether the output of that command will have updated while hook execution is still ongoing. Would need a deep dive in their source code to determine if that is subject to any race conditions or concurrency issues. With the current logic I update the DB by connection name and protocol (the current hook implementation completely ignores IPv6 events!), which allows me to know whether any connections are still up.

re: database inconsistency if strongswan crashes—yes it could happen, e.g. strongswan could crash, never send a connection down hook, and never re-establish the connection after it restarts. Interface stays up when it shouldn't be. This scenario is possible with the current hook logic too that just directly calls ifup/ifdown. This shouldn't be introducing any new regressions, but this is certainly a scenario we may want to circle back and optimize later.

lucasec avatar Jul 04 '24 06:07 lucasec

Nice to see the smoketests running via github actions now, but they don't seem to actually test using my branch. E.g. the smoketest run is not including any new smoketests added via this PR.

lucasec avatar Jul 05 '24 08:07 lucasec

Nice to see the smoketests running via github actions now, but they don't seem to actually test using my branch. E.g. the smoketest run is not including any new smoketests added via this PR.

This is because your branch was branched of from current during a time where the smoketests were not evolved enough. Just rebase to current and there you go!

c-po avatar Jul 05 '24 12:07 c-po

❌ warning: Unused import os in smoketest/scripts/cli/test_interfaces_l2tpv3.py:17.

github-actions[bot] avatar Jul 05 '24 17:07 github-actions[bot]

This is because your branch was branched of from current during a time where the smoketests were not evolved enough. Just rebase to current and there you go!

Rebased to most recent current but still not seeing it run my tests.

There also seem to be two persistently failing that are unrelated to my PR:

  • test_ospf_15_ldp_sync (__main__.TestProtocolsOSPF.test_ospf_15_ldp_sync)
  • test_09_lb_reverse_proxy_logging (__main__.TestLoadBalancingReverseProxy.test_09_lb_reverse_proxy_logging)

lucasec avatar Jul 05 '24 19:07 lucasec

The LB reverse proxy error is fixed in https://github.com/vyos/vyos-1x/pull/3790

The OSPF issue I have still not found the root cause

c-po avatar Jul 05 '24 20:07 c-po

Added additional comments and improved method naming.

New smoketests pass and this is largely ready for others to try out, but I am still refining/working on a few race conditions found via manual testing on my live system. Some discussion in Slack here: https://vyos-community.slack.com/archives/C01A6CJFW1F/p1720164511770299?thread_ts=1719111681.530719&cid=C01A6CJFW1F

lucasec avatar Jul 05 '24 20:07 lucasec

From testing so far, the ipsec daemon seems to invoke the up-down hook synchronously, so there is not much need to worry about races on the VTI up-down DB (/tmp/ipsec-vti-interfaces file) outside of when the hook runs concurrently to a config commit. That latter case was already affected by a race (that happens to also affect the existing hook) where attempting to read the config during a commit can fail and return a blank interface config.

Both the DB race and config read race are largely solvable by calling wait_for_commit_lock before doing anything in the hook. (Not 100% solvable as wait_for_commit_lock doesn't acquire any sort of lock itself, providing no guarantee that a commit won't start while the script is still running, but this possibility is both rare and not exclusive to this hook)

I am also removing some old logic around clearing route table 220 and warnings about the combination of VTI and the lack of disable-route-autoinstall. Needing to disable route installation was specific to the legacy vti interface type implementation and does not appear to be relevant to the xfrm implementation used now.

lucasec avatar Jul 07 '24 09:07 lucasec

FYI, the only smoke test failure in that latest execution is test_ospf_15_ldp_sync.

I still am not seeing the smoketest action including new tests I added in this PR, e.g. test_remote_access_pool_range and test_remote_access_vti. These are included in the run and passing on my local machine though.

I am reasonably confident in the implementation now and would say everything here is ready for review.

lucasec avatar Jul 08 '24 05:07 lucasec

This pull request has conflicts, please resolve those before we can evaluate the pull request.

github-actions[bot] avatar Jul 22 '24 14:07 github-actions[bot]

Conflicts have been resolved. A maintainer will review the pull request shortly.

github-actions[bot] avatar Jul 22 '24 17:07 github-actions[bot]

Rebased on top of changes merged in https://github.com/vyos/vyos-1x/pull/3841. FYI the smoketests CI job still doesn't seem to run the new tests added in this PR. I did confirm these to be passing on my local machine.

lucasec avatar Jul 22 '24 22:07 lucasec