vyos-1x
vyos-1x copied to clipboard
T5873: ipsec remote access VPN: support VTI interfaces.
Change Summary
Route-based VPNs can be more convenient to configure and tie in nicely with existing routing protocols, zone-based firewalls, and other common network configurations. OpenVPN users are already quite familiar with this pattern. This PR extends the IPsec (IKEv2) Remote Access VPN to support "virtual tunnel interfaces" enabling similar usage patterns.
Types of changes
- [ ] Bug fix (non-breaking change which fixes an issue)
- [x] New feature (non-breaking change which adds functionality)
- [ ] Code style update (formatting, renaming)
- [ ] Refactoring (no functional changes)
- [ ] Migration from an old Vyatta component to vyos-1x, please link to related PR inside obsoleted component
- [ ] Other (please describe):
Related Task(s)
https://vyos.dev/T5873
Related PR(s)
Component(s) name
ipsec remote-access
Proposed changes
This PR includes two key changes to enable VTI-based remote access VPNs:
- The
remote-access poolconfiguration block has been extended to accept arangeblock, as an alternative to the current CIDRprefixattribute. This allows defining a more granular range for assigning VPN client IPs, which is helpful if you want to reserve one or more IPs at the start of a CIDR block for the router itself on the VTI interface. - The
remote-access connectionaccepts a newbindattribute. This works identically to thepeer <peer> vti bindattribute (for site-to-site peers you either define one or moretunnels, or avtiblock—there is no equivalent oftunnelfor remote-access connections hence the decision not to nest it undervtihere). Once defined, all traffic to/from connected peers uses the specified VTI interface, as opposed to being routed by kernel policies. This change is enabled by the internal change in VyOS 1.4 that switched from the legacyvtiinterface type to the newerxfrminterface type, which happily multiple tunnels with different local/remote traffic selectors, e.g. for each connected VPN client.
How to test
Example configuration:
interfaces {
ethernet eth0 {
...
}
vti vti1 {
address 10.23.58.1/24
address fdcc:2200:a8ee:2358::1/64
description "Client VPN"
mtu 1436
}
}
vpn {
ipsec {
esp-group ClientVPN-Client {
lifetime 3600
pfs enable
proposal 1 {
encryption aes256gcm128
hash sha256
}
}
ike-group ClientVPN-Client {
key-exchange ikev2
lifetime 7200
proposal 1 {
dh-group 21
encryption aes256gcm128
hash sha256
}
}
options {
disable-route-autoinstall
}
remote-access {
connection ClientVPN {
authentication {
client-mode x509
local-id <local id>
server-mode x509
x509 {
ca-certificate <ca cert name>
certificate <cert name>
}
}
bind vti1
dhcp-interface eth0
esp-group ClientVPN-Client
ike-group ClientVPN-Client
pool Client-Pool-v4
pool Client-Pool-v6
}
pool Client-Pool-v4 {
name-server 10.23.58.1
range {
start 10.23.58.2
stop 10.23.58.254
}
}
pool Client-Pool-v6 {
name-server fdcc:2200:a8ee:2358::1
range {
start fdcc:2200:a8ee:2358::2
stop fdcc:2200:a8ee:2358::ffff
}
}
}
}
}
Smoketest result
INFO - Executing VyOS smoketests
DEBUG - vyos@vyos:~$ /usr/bin/vyos-smoketest
DEBUG - /usr/bin/vyos-smoketest
DEBUG - Running Testcase: /usr/libexec/vyos/tests/smoke/cli/test_vpn_ipsec.py
DEBUG - test_dhcp_fail_handling (__main__.TestVPNIPsec.test_dhcp_fail_handling) ... ok
DEBUG - test_dmvpn (__main__.TestVPNIPsec.test_dmvpn) ... ok
DEBUG - test_flex_vpn_vips (__main__.TestVPNIPsec.test_flex_vpn_vips) ... ok
DEBUG - test_remote_access (__main__.TestVPNIPsec.test_remote_access) ... ok
DEBUG - test_remote_access_dhcp_fail_handling (__main__.TestVPNIPsec.test_remote_access_dhcp_fail_handling) ... ok
DEBUG - test_remote_access_eap_tls (__main__.TestVPNIPsec.test_remote_access_eap_tls) ... ok
DEBUG - test_remote_access_pool_range (__main__.TestVPNIPsec.test_remote_access_pool_range) ... ok
DEBUG - test_remote_access_vti (__main__.TestVPNIPsec.test_remote_access_vti) ... ok
DEBUG - test_remote_access_x509 (__main__.TestVPNIPsec.test_remote_access_x509) ... ok
DEBUG - test_site_to_site (__main__.TestVPNIPsec.test_site_to_site) ... ok
DEBUG - test_site_to_site_vti (__main__.TestVPNIPsec.test_site_to_site_vti) ... ok
DEBUG - test_site_to_site_x509 (__main__.TestVPNIPsec.test_site_to_site_x509) ... ok
Checklist:
- [x] I have read the CONTRIBUTING document
- [x] I have linked this PR to one or more Phabricator Task(s)
- [x] I have run the components SMOKETESTS if applicable
- [x] My commit headlines contain a valid Task id
- [x] My change requires a change to the documentation
- [ ] I have updated the documentation accordingly
Changing this to a draft as it appears the up/down script may need some work for this to work properly. Seems to have been lost in the rebase.
@lucasec is the fix in #3302 in any way related to the issue you mention?
It’s not related to this PR. I also don’t think that fix is related to the instability I’ve been seeing in https://vyos.dev/T6177. I am planning to resume work on this shortly while I continue to debug that issue.
Sounds great, thank you 🙂
Very interesting approach. But why I need also pools if now ecerything cones from the VTI? Is it b/c of IP address assignment reasons (fake dhcp?)
Yeah, the IP addresses are still assigned to clients via the IKE protocol, so the pool configuration is needed to tell strongswan what range to create client IPs from.
Getting the up/down logic for the interface right is a little interesting. I would assume if a remote access is bound to the VTI, the VTI interface should be up all the time.
Cleanest implementation is probably to have set_admin_state (here: https://github.com/vyos/vyos-1x/blob/9eb018c4935235d292d7c693ac15da5761be064a/python/vyos/ifconfig/vti.py#L59) check for dependencies in the ipsec config, then only no-op if the interface is unbound or bound to a single site-to-site config (so for remote access, the interface would come up immediately via the normal interface up/down code).
Btw the logic for T6085 for site-to-site configs (https://github.com/vyos/vyos-1x/commit/9eb018c4935235d292d7c693ac15da5761be064a) has some potentially unintended behavior that disabling/re-enabling a VTI interface in the config does not immediately take effect—the interface state will only update after the ipsec connection gets torn down or ipsec is restarted.
Any updates?
I think I set this aside awaiting further feedback on the approach for the up/down script I shared in this comment: https://github.com/vyos/vyos-1x/pull/3221#issuecomment-2056106543.
Let me re-visit later this week and try to put together an implementation—that may be a faster way to move this forward.
This pull request has conflicts, please resolve those before we can evaluate the pull request.
This might be off topic, but how come there all of a sudden is a conflict, with no new commits?
This might be off topic, but how come there all of a sudden is a conflict, with no new commits?
The action was not run automatically for quiet some time
Okay, that explains it 🙂
Hi all, I haven't forgotten about this PR, but my work schedule has been rather unforgiving lately. Thanks for your patience and I'll hopefully have some improvements soon to get this closer to merging.
I am about half way through an implementation for the vti-up-down hook and just wanted to write out my approach in case anyone has early feedback on it.
There are three issues with the current vti-up-down hook:
- The hook does not work with ipsec remote-access configurations, since a single configuration can map to potentially many connections (e.g. one for each client). The current hook brings the entire interface up every time any connection connects, and down every time any connection disconnects.
- While technically a special case of 1, for the same reason as 1 the hook does not allow multiple ipsec connections to share a single VTI interface (see https://vyos.dev/T5874 for some motivation why we might want this).
- Also slightly out of scope of T5873, though I intend to address at the same time: setting or clearing the
disableattribute on a VTI interface does not immediately take effect, asset_admin_statein the interface code is no-oped. The only way currently to disable a VTI interface is to configure it as disabled then restart the ipsec daemon.
My plan to introduce a new database file /tmp/ipsec_vti_interfaces that will store a space-separated list of entries. The vti-up-down hook will write entries to this DB of the following format: interface:connection-name:protocol, where interface is the VTI interface to bring up, connection-name is the ipsec connection calling for the interface to be brought up, and protocol is v4 for IPv4 or v6 for IPv6. The config logic can write an additional entry of the format interface if the VTI interface is called to always be up (in the case of remote-access connections).
For remote-access connections, the config logic will also bring the interface up at apply time if it is not already. The VTI class's set_admin_state method will also re-gain its ability to bring interfaces up and down, but before bringing an interface up it will check the DB file to see if a connection is calling for it to be up.
Obviously this has turned into a bigger change than I was expecting, but hopefully once I work the last few kinks out it will buy us a more robust foundation for future ipsec/VTI work.
@lucasec is there a way the database can end up inconsistent? (Like a connection not getting removed if something crashes)
Just got an idea - not sure if it would work - but what if the "down" parsed the output from swanctl --list-conns - wouldn't it be possible to detect if any more connections exist, and then only down the interface if the last one is gone?
Conflicts have been resolved. A maintainer will review the pull request shortly.
👍 No issues in PR Title / Commit Title
Adding a first draft of an implementation in the vain of https://github.com/vyos/vyos-1x/pull/3221#issuecomment-2185714342.
This is still pending even the most basic testing, will update once that is complete.
In general I think the approach turned out somewhat clean, although it still needs a little bit of hardening for race conditions between the configuration code and the vti-up-down hook (possibly using some sort of file-based locking mechanism, need to see if there is any prior art here).
@GurliGebis The reason for avoiding swanctl --list-conns as I'm not aware of strongswan giving any guarantees as to whether the output of that command will have updated while hook execution is still ongoing. Would need a deep dive in their source code to determine if that is subject to any race conditions or concurrency issues. With the current logic I update the DB by connection name and protocol (the current hook implementation completely ignores IPv6 events!), which allows me to know whether any connections are still up.
re: database inconsistency if strongswan crashes—yes it could happen, e.g. strongswan could crash, never send a connection down hook, and never re-establish the connection after it restarts. Interface stays up when it shouldn't be. This scenario is possible with the current hook logic too that just directly calls ifup/ifdown. This shouldn't be introducing any new regressions, but this is certainly a scenario we may want to circle back and optimize later.
Nice to see the smoketests running via github actions now, but they don't seem to actually test using my branch. E.g. the smoketest run is not including any new smoketests added via this PR.
Nice to see the smoketests running via github actions now, but they don't seem to actually test using my branch. E.g. the smoketest run is not including any new smoketests added via this PR.
This is because your branch was branched of from current during a time where the smoketests were not evolved enough. Just rebase to current and there you go!
❌ warning: Unused import os in smoketest/scripts/cli/test_interfaces_l2tpv3.py:17.
This is because your branch was branched of from current during a time where the smoketests were not evolved enough. Just rebase to
currentand there you go!
Rebased to most recent current but still not seeing it run my tests.
There also seem to be two persistently failing that are unrelated to my PR:
test_ospf_15_ldp_sync (__main__.TestProtocolsOSPF.test_ospf_15_ldp_sync)test_09_lb_reverse_proxy_logging (__main__.TestLoadBalancingReverseProxy.test_09_lb_reverse_proxy_logging)
The LB reverse proxy error is fixed in https://github.com/vyos/vyos-1x/pull/3790
The OSPF issue I have still not found the root cause
Added additional comments and improved method naming.
New smoketests pass and this is largely ready for others to try out, but I am still refining/working on a few race conditions found via manual testing on my live system. Some discussion in Slack here: https://vyos-community.slack.com/archives/C01A6CJFW1F/p1720164511770299?thread_ts=1719111681.530719&cid=C01A6CJFW1F
From testing so far, the ipsec daemon seems to invoke the up-down hook synchronously, so there is not much need to worry about races on the VTI up-down DB (/tmp/ipsec-vti-interfaces file) outside of when the hook runs concurrently to a config commit. That latter case was already affected by a race (that happens to also affect the existing hook) where attempting to read the config during a commit can fail and return a blank interface config.
Both the DB race and config read race are largely solvable by calling wait_for_commit_lock before doing anything in the hook.
(Not 100% solvable as wait_for_commit_lock doesn't acquire any sort of lock itself, providing no guarantee that a commit won't start while the script is still running, but this possibility is both rare and not exclusive to this hook)
I am also removing some old logic around clearing route table 220 and warnings about the combination of VTI and the lack of disable-route-autoinstall. Needing to disable route installation was specific to the legacy vti interface type implementation and does not appear to be relevant to the xfrm implementation used now.
FYI, the only smoke test failure in that latest execution is test_ospf_15_ldp_sync.
I still am not seeing the smoketest action including new tests I added in this PR, e.g. test_remote_access_pool_range and test_remote_access_vti. These are included in the run and passing on my local machine though.
I am reasonably confident in the implementation now and would say everything here is ready for review.
This pull request has conflicts, please resolve those before we can evaluate the pull request.
Conflicts have been resolved. A maintainer will review the pull request shortly.
Rebased on top of changes merged in https://github.com/vyos/vyos-1x/pull/3841. FYI the smoketests CI job still doesn't seem to run the new tests added in this PR. I did confirm these to be passing on my local machine.