libnl icon indicating copy to clipboard operation
libnl copied to clipboard

Changes in IPv6 nexthops cause nl_cache_mngr routes to have duplicate…

Open tlsalmin opened this issue 4 years ago • 2 comments

… nexthops.

Changing the nexthop of a IPv6 caused a netlink cache to retain both nexthops for the route. Kernel in fib6_add_rt2node will handle an RTM_NEWROUTE with NLM_F_REPLACE by removing the other nexthops with a gateway. Introduced this behaviour to route_obj.c:route_update().

The nlmsg_flags, which include NLM_F_REPLACE, were omitted from the new route object. Added only the NLM_F_REPLACE flag if present.

tlsalmin avatar Aug 20 '21 11:08 tlsalmin

~/src/random_tests on heads/master ● ● λ ./nl_mngr_bug &                                                       
[1] 134369
~/src/random_tests on heads/master ● ● λ ip -6 r            
::1 dev lo proto kernel metric 256 pref medium
anycast fe80:: dev enp34s0 proto kernel metric 0 pref medium
anycast fe80:: dev tun0 proto kernel metric 0 pref medium
anycast fe80:: dev br-6a8cdaf9abb2 proto kernel metric 0 pref medium
fe80::/64 dev enp34s0 proto kernel metric 256 pref medium
fe80::/64 dev tun0 proto kernel metric 256 pref medium
fe80::/64 dev br-6a8cdaf9abb2 proto kernel metric 256 linkdown pref medium
multicast ff00::/8 dev enp34s0 proto kernel metric 256 pref medium
multicast ff00::/8 dev tun0 proto kernel metric 256 pref medium
multicast ff00::/8 dev br-6a8cdaf9abb2 proto kernel metric 256 linkdown pref medium
~/src/random_tests on heads/master ● ● λ sudo ip -6 r a fd99::100/128 via fe80::58e5:27ff:fea6:f594 dev enp34s0
Received 1 of route [inet6 fd99::100 table main type unicast 
    scope global priority 0x400 protocol boot 
    nexthop via fe80::58e5:27ff:fea6:f594 dev enp34s0 <>
]
~/src/random_tests on heads/master ● ● λ sudo ip -6 r chg fd99::100/128 via fe80::885b:7aff:fe5f:653c dev enp34s0
Received 5 of route [inet6 fd99::100 table main type unicast 
    scope global priority 0x400 protocol boot 
    nexthop via fe80::885b:7aff:fe5f:653c dev enp34s0 <>
]
~/src/random_tests on heads/master ● ● λ sudo ip -6 r chg fd99::100/128 via fe80::885b:7aff:fe5f:653c dev enp34s0
~/src/random_tests on heads/master ● ● λ sudo ip -6 r chg fd99::100/128 via fe80::885b:7aff:fe5f:653c dev enp34s0
~/src/random_tests on heads/master ● ● λ Received 5 of route [inet6 ff00::/8 table main type multicast 
    scope global priority 0x100 protocol kernel 
    nexthop dev veth83a87b1 <>
]
Received 5 of route [inet6 fe80::/64 table main type unicast 
    scope global priority 0x100 protocol kernel 
    nexthop dev veth83a87b1 <>
]
Received 2 of route [inet6 fe80::/64 table main type unicast 
    scope global priority 0x100 protocol kernel 
    nexthop dev veth83a87b1 <>
]
Received 2 of route [inet6 ff00::/8 table main type multicast 
    scope global priority 0x100 protocol kernel 
    nexthop dev veth83a87b1 <>
]

~/src/random_tests on heads/master ● ● λ ip -6 r
::1 dev lo proto kernel metric 256 pref medium
fd99::100 via fe80::885b:7aff:fe5f:653c dev enp34s0 metric 1024 pref medium
anycast fe80:: dev enp34s0 proto kernel metric 0 pref medium
anycast fe80:: dev tun0 proto kernel metric 0 pref medium
anycast fe80:: dev br-6a8cdaf9abb2 proto kernel metric 0 pref medium
fe80::/64 dev enp34s0 proto kernel metric 256 pref medium
fe80::/64 dev tun0 proto kernel metric 256 pref medium
fe80::/64 dev br-6a8cdaf9abb2 proto kernel metric 256 linkdown pref medium
multicast ff00::/8 dev enp34s0 proto kernel metric 256 pref medium
multicast ff00::/8 dev tun0 proto kernel metric 256 pref medium
multicast ff00::/8 dev br-6a8cdaf9abb2 proto kernel metric 256 linkdown pref medium
~/src/random_tests on heads/master ● ● λ sudo ip -6 r d fd99::100/128 
Received 2 of route [inet6 fd99::100 table main type unicast 
    scope global priority 0x400 protocol boot 
    nexthop via fe80::885b:7aff:fe5f:653c dev enp34s0 <>
]
~/src/random_tests on heads/master ● ● λ ip -6 r
::1 dev lo proto kernel metric 256 pref medium
anycast fe80:: dev enp34s0 proto kernel metric 0 pref medium
anycast fe80:: dev tun0 proto kernel metric 0 pref medium
anycast fe80:: dev br-6a8cdaf9abb2 proto kernel metric 0 pref medium
fe80::/64 dev enp34s0 proto kernel metric 256 pref medium
fe80::/64 dev tun0 proto kernel metric 256 pref medium
fe80::/64 dev br-6a8cdaf9abb2 proto kernel metric 256 linkdown pref medium
multicast ff00::/8 dev enp34s0 proto kernel metric 256 pref medium
multicast ff00::/8 dev tun0 proto kernel metric 256 pref medium
multicast ff00::/8 dev br-6a8cdaf9abb2 proto kernel metric 256 linkdown pref medium
~/src/random_tests on heads/master ● ● λ sudo ip -6 r a fd99::100/128 via fe80::58e5:27ff:fea6:f594 dev enp34s0
Received 1 of route [inet6 fd99::100 table main type unicast 
    scope global priority 0x400 protocol boot 
    nexthop via fe80::58e5:27ff:fea6:f594 dev enp34s0 <>
]

Added some testlog with patch as it was shown in #288

tlsalmin avatar Aug 20 '21 11:08 tlsalmin

Same reply as: https://github.com/thom311/libnl/pull/277#issuecomment-1099212727


Of course it should be fixed. But without first adding sensible unit-tests for all of this, I don't feel confident to merge this (and quite possibly, this patch is not sufficient).

I will add some unit-tests first...

Sorry for the disappointing reply.

thom311 avatar Apr 14 '22 13:04 thom311

the caching of routes clearly has issues. But they go beyond this patch.

the major problem is that route_obj_ops.oo_id_attrs contains

    .oo_id_attrs = (ROUTE_ATTR_FAMILY | ROUTE_ATTR_TOS |
                   ROUTE_ATTR_TABLE | ROUTE_ATTR_DST |
                   ROUTE_ATTR_PRIO),
    .oo_id_attrs_get = route_id_attrs_get,

but that is not adequate for routes. It would be closer to what NetworkManager does here and here.

But for IPv6 routes it's more complicated. When you do ip -6 route append, then kernel internally merges routes that only differ by their IPv6 next hop. So on the one hand, the next hop is part of the route's identity (as you can add two routes that only differ by their next-hop), on the other hand, when you send a RTM_NEWROUTE request to add a new route, then kernel will take an existing route (with same values, except the next-hop) and merge the two together into a multihop (ECMP) route. I think the better way to interpret this from user-space, is that all IPv6 are fundamentally single-hop routes, but kernel on the netlink API (RTM_NEWROUTE/RTM_DELROUTE) pretends that one message can apply to multiple routes. NetworkManager does that here.


Anyway. The point is, to properly fix this, is complicated.

I think the next step would be to significantly improve the unit tests.

What happened a while ago, is that unit tests now can spawn a separate netns. That makes it easier to add test that reconfigure the system (add routes), without requiring root permissions or mess with the system's network configuration. That's in tests/cksuite-all-netns.c.

I don't feel confident merging this just as is. First tests need to be improved, then fixing all those ugly special cases. Maybe get inspired by NetworkManager code. Note that there are various cases (e.g. link down) where kernel intentionally does not send netlink messages. NetworkManager works around that, by (trying to) detecting those cases and fetch a full dump. Unclear whether that would be appropriate for the libnl3 cache. Also, there are various subtle bugs where the netlink messages are wrong or missing. That makes this tricky.

Closing. I am sorry for that.

thom311 avatar Jul 25 '23 10:07 thom311