vyos-1x icon indicating copy to clipboard operation
vyos-1x copied to clipboard

routing: T1237: Add new feature failover route

Open sever-sever opened this issue 3 years ago • 2 comments

Change Summary

The failover route allows installing static routes to the kernel routing table only if the required target or gateway is alive When the target or gateway doesn't respond to ICMP/ARP checks this route deleted from the routing table Routes are marked as protocol failover (rt_protos)

cat /etc/iproute2/rt_protos.d/failover.conf
111  failover
ip route add 203.0.113.1 metric 2 via 192.0.2.1 dev eth0 proto failover

$ sudo ip route show proto failover
203.0.113.1 via 192.0.2.1 dev eth0 metric 1

So we can safely flush such routes

Types of changes

  • [ ] Bug fix (non-breaking change which fixes an issue)
  • [x] New feature (non-breaking change which adds functionality)
  • [ ] Code style update (formatting, renaming)
  • [ ] Refactoring (no functional changes)
  • [ ] Migration from an old Vyatta component to vyos-1x, please link to related PR inside obsoleted component
  • [ ] Other (please describe):

Related Task(s)

  • https://phabricator.vyos.net/T1237

Component(s) name

failover, route

Proposed changes

How to test

VyOS configuration:

set protocols failover route 203.0.113.1/32 next-hop 192.168.100.1 check target '192.168.100.1'
set protocols failover route 203.0.113.1/32 next-hop 192.168.100.1 check timeout '10'
set protocols failover route 203.0.113.1/32 next-hop 192.168.100.1 check type 'icmp'
set protocols failover route 203.0.113.1/32 next-hop 192.168.100.1 interface 'eth1'
set protocols failover route 203.0.113.1/32 next-hop 192.168.100.1 metric '2'

Check service

vyos@r14# sudo systemctl status vyos-failover
● vyos-failover.service - Failover route service
     Loaded: loaded (/etc/systemd/system/vyos-failover.service; disabled; vendor preset: enabled)
     Active: active (running) since Mon 2022-06-13 19:53:23 EEST; 4s ago
   Main PID: 6515 (python3)
      Tasks: 1 (limit: 4695)
     Memory: 6.1M
        CPU: 35ms
     CGroup: /system.slice/vyos-failover.service
             └─6515 /usr/bin/python3 /usr/libexec/vyos/vyos-failover.py --config /run/vyos-failover.conf

Check routing table

vyos@r14:~$ show ip route | grep 203
K>* 203.0.113.1/32 [0/2] via 192.168.100.1, eth1, 00:01:00
vyos@r14:~$ 
vyos@r14:~$ sudo ip route show proto failover
203.0.113.1 via 192.168.100.1 dev eth1 metric 2 
vyos@r14:~$ 

Deleting protocols failover must delete all routes proto failover

vyos@r14# delete protocols failover 
[edit]
vyos@r14# commit
[edit]
vyos@r14# sudo ip route show proto failover
[edit]
vyos@r14# 

Checklist:

  • [x] I have read the CONTRIBUTING document
  • [x] I have linked this PR to one or more Phabricator Task(s)
  • [ ] I have run the components SMOKETESTS if applicable
  • [x] My commit headlines contain a valid Task id
  • [x] My change requires a change to the documentation
  • [ ] I have updated the documentation accordingly

sever-sever avatar Jun 13 '22 16:06 sever-sever

I wonder if we can merge this PR with the wan load-balance functionality for VyOS 1.4 - Other vendors refer to such a feature like IP SLA. I for myself find the CLI notation a big clumsy, but I have yet no better idea, sorry.

c-po avatar Sep 28 '22 17:09 c-po

@sever-sever very cool, this is already a much better solution to the current WLB implementation. Two things relating to timing though: The timeout parameter might be better named "interval" to reflect the way it works behind the scenes. And this parameter does not seem to be used for TCP checks, the timeout there is fixed at 2 seconds, so if I were to configure a 1 second timeout none of the other checks are going to happen for a full 3 seconds (if it's down), as is the case with many routes with long timeouts. A quick(ish) workaround would be to use instance parameters on a templated systemd job, so one service instance per route. Clean up is straightforward since it's "kill everything I don't know about", "(re)start what I do know about".

I would also consider table and vrf parameters, especially table is particularly useful for PBR where you could have consumer bulk traffic uplinks for general internet traffic but allow failover to a priority link if the former are down, in table 10, the reverse of this being in table 20 for the actual priority traffic.

As mentioned in the other issue there is the dpinger tool that could be added in along with this and that would add even more functionality. I have done the basic test of seeing if it compiles on vyos and indeed it does and functions as expected.

@c-po I think we should call it SLA if the above can be implemented as from my understanding IP SLA implies measurement of link quality not just possibility of routing. Being able to run show ip sla statistics on vyos would be next level 👀

thetooth avatar Oct 12 '22 10:10 thetooth

Hello. Do you plan to use this mechanism for WAN failover?

Harliff avatar Jan 24 '24 07:01 Harliff