libnl icon indicating copy to clipboard operation
libnl copied to clipboard

Unexpected behavior in RTM_NEWROUTE and RTM_DELROUTE when using events through nl_cache_mngr_add_cache_v2() after running ip route change command in IPv6

Open lpares12 opened this issue 6 years ago • 5 comments

When trying to listen for changes in the IPv6 route table using nl_cache_mngr_add_cache_v2(), some events received are misaligned with the kernel table. While with IPv4, same steps produce no misalignment with IPv4 route table in kernel.

The kernel used is 4.14.49-OpenNetworkLinux, and the commit for libnl is 7b167ef85f6eb4d7faca349302478b2dc121e309

Methods used to set up the events callback are:

nl_cache_mngr_alloc(sk, NETLINK_ROUTE, NL_AUTO_PROVIDE, &mngr);
rtnl_addr_alloc_cache(sk, &addr_cache);
nl_cache_mngr_add_cache_v2(mngr, addr_cache, (change_func_v2_t)&event_callback, this);
rtnl_route_alloc_cache(sk, AF_UNSPEC, 0, &route_cache);
nl_cache_mngr_add_cache_v2(mngr, route_cache, (change_func_v2_t)&event_callback, this);
nl_cache_mngr_get_fd(mngr);

cache_mngr_fd is read with epoll_wait and using nl_cache_mngr_data_ready(mngr); we process the events, which ends up calling our callback event_callback(struct nl_cache *cache, struct nl_object *old_obj, struct nl_object *new_obj, uint64_t diff, int action, void* data)

Depending on action received (NEW/DEL/CHANGE) old_obj or new_obj parameters are used to retrieve the type of the message with the function nl_object_get_msgtype() and then the combination of action+type specify what to do.

When a msg_type RTM_NEWROUTE or RTM_DELROUTE is received, this is what we do:

  • NL_ACT_NEW: we install route in new_obj parameter with all its next hops.
  • NL_ACT_DEL: we remove route in old_obj parameter with all its next hops.
  • NL_ACT_CHANGE: old_obj route is removed (with all its next hops) and new_obj route is added (also with all its next hops)

old_obj and new_obj are casted to rtnl_route and parsed, using rtnl_route_get_nnexthops() in combination with rtnl_route_nexthop_n() to retrieve the next hops.

We have observed that after manipulating the kernel table with ip route command, creating, changing and removing IPv6 routes, the events received are not consistent. Problem is always reproduced performing the following steps:

$ ip -6 r s
2012::/64 dev testing_iface  proto static  metric 1024

An IPv6 route is added, and logs show that a NEW_ROUTE event to create a route with one next hop is correctly received:

$ ip route add 2050::/64 via 2012::2
$ ip -6 r s
2012::/64 dev testing_iface  proto static  metric 1024 
2050::/64 via 2012::2 dev testing_iface  metric 1024

Received a NL event of type NEW_ROUTE and action NEW
Received an event to add an IPv6 route prefix: 2050::/64, next hop: 2012::2

Then a next hop is added with change command and a correct event is received, with CHANGE action one old next hop and two new next hops: :

$ ip route change 2050::/64 nexthop via 2012::2 nexthop via 2012::3
$ ip -6 r s
2012::/64 dev testing_iface  proto static  metric 1024 
2050::/64  metric 1024 
        nexthop via 2012::2  dev testing_iface weight 1
        nexthop via 2012::3  dev testing_iface weight 1

IPv6: RTM_NEWROUTE with no NLM_F_CREATE or NLM_F_REPLACE
IPv6: NLM_F_CREATE should be set when creating new route
Received a NL event of type NEW_ROUTE and action CHANGE
Received an event to change (remove) IPv6 route prefix: 2050::/64, next hop: 2012::2
Received an event to change (add) IPv6 route prefix: 2050::/64, next hop: 2012::2
Received an event to change (add) IPv6 route 2050::/64, next hop: 2012::3

Now, a next hop is removed, also with a correct change event.

$ ip route change 2050::/64 nexthop via 2012::2
$ ip -6 r s
2012::/64 dev testing_iface  proto static  metric 1024 
2050::/64 via 2012::2 dev testing_iface  metric 1024 

Received a NL event of type NEW_ROUTE action CHANGE
Received an event to change (remove) IPv6 route prefix: 2050::/64, next hop: 2012::2
Received an event to change (remove) IPv6 route prefix: 2050::/64, next hop: 2012::3
Received an event to change (add) an IPv6 route prefix: 2050::/64, next hop: 2012::2

Adding a next hop again, produces an incorrect event, even though routing table is correct. As shown below, a CHANGE event is received but instead of having one next hop to remove, and two to add, the old_obj parameter contains three next hops, even though next hop 2012::3 previously did not exist in the route.

$ ip route change 2050::/64 nexthop via 2012::2 nexthop via 2012::3
$ ip -6 r s
2012::/64 dev testing_iface  proto static  metric 1024 
2050::/64  metric 1024 
        nexthop via 2012::2  dev testing_iface weight 1
        nexthop via 2012::3  dev testing_iface weight 1

kernel: [172591.706498] IPv6: RTM_NEWROUTE with no NLM_F_CREATE or NLM_F_REPLACE
kernel: [172591.713715] IPv6: NLM_F_CREATE should be set when creating new route
Received a NL event of type NEW_ROUTE action CHANGE
Received an event to change (remove) an IPv6 route prefix: 2050::/64, next hop: 2012::2
Received an event to change (remove) an IPv6 route prefix: 2050::/64, next hop: 2012::3
Received an event to change (remove) an IPv6 route prefix: 2050::/64, next hop: 2012::2
Received an event to change (add) an IPv6 route prefix: 2050::/64, next hop: 2012::2
Received an event to change (add) an IPv6 route prefix: 2050::/64, next hop: 2012::3

Finally, when removing the route, the action DELETE is received and the old_obj contains the correct next hops.

$ ip route del 2050::/64
$ ip -6 r s
2012::/64 dev testing_iface  proto static  metric 1024 

Received a NL event of type NEW_ROUTE and action DELETE
Received an event to remove an IPv6 route prefix: 2050::/64, next hop: 2012::2
Received an event to remove an IPv6 route prefix: 2050::/64, next hop: 2012::3

Another way to reproduce the issue is following this steps:

Route is added directly with two next hops:

$ ip -6 r s
2012::/64 dev testing_iface  proto static  metric 1024 
$ ip route add 2050::/64 nexthop via 2012::2 nexthop via 2012::3
$ ip -6 r s
2012::/64 dev testing_iface  proto static  metric 1024 
2050::/64  metric 1024 
        nexthop via 2012::2  dev testing_iface weight 1
        nexthop via 2012::3  dev testing_iface weight 1

Received a NL event of type NEW_ROUTE and action CREATE
Received an event to add an IPv6 route prefix: 2050::/64, next hop: 2012::2
Received an event to add an IPv6 route prefix: 2050::/64, next hop: 2012::3

Then, route is changed to one next hop:

$ ip route change 2050::/64 via 2012::2
$ ip -6 r s
2012::/64 dev testing_iface  proto static  metric 1024 
2050::/64 via 2012::2 dev testing_iface  metric 1024 

Received a NL event of type NEW_ROUTE and action CHANGE
Received an event to change (remove) IPv6 route prefix: 2050::/64, next hop: 2012::2
Received an event to change (remove) IPv6 route prefix: 2050::/64, next hop: 2012::3
Received an event to change (add) IPv6 route prefix: 2050::/64, next hop: 2012::2

Route is deleted:

$ ip route del 2050::/64
$ ip -6 r s
2012::/64 dev testing_iface  proto static  metric 1024 

Received a NL event of type DEL_ROUTE and action CHANGE
Received an event to change (remove) IPv6 rpute prefix: 2050::/64, next hop: 2012::2
Received an event to change (remove) IPv6 route prefix: 2050::/64, next hop: 2012::3
Received an event to change (remove) IPv6 route prefix: 2050::/64, next hop: 2012::2
Received an event to change (add) IPv6 route prefix: 2050::/64, next hop: 2012::2

In this last case, a DEL_ROUTE is received with action CHANGE, the old_obj contains three next hops, and the new_obj contains one. When according to previous tests with ip route del, a NEW_ROUTE with action DELETE and only one next hop to remove is expected.

The main issue seems to be that the behavior for events is undefined when using ip route change in IPv6 routes. Also, note that with IPv4 we never receive a DEL_ROUTE event.

lpares12 avatar Sep 04 '19 10:09 lpares12

I think this is the same issue as https://github.com/thom311/libnl/issues/224#issuecomment-526910349

thom311 avatar Sep 04 '19 10:09 thom311

We are unsure this is the case, since it looks like the events are correctly received in most cases. The issue seems to appear after a nexthop is removed and added again, and as far as I understand, #224 should not affect the msg_type that we receive (DEL_ROUTE instead of NEW_ROUTE) in some cases.

lpares12 avatar Sep 04 '19 12:09 lpares12

I think it is related. libnl3 assumes that the ID of a routes is

	.oo_id_attrs		= (ROUTE_ATTR_FAMILY | ROUTE_ATTR_TOS |
				   ROUTE_ATTR_TABLE | ROUTE_ATTR_DST |
				   ROUTE_ATTR_PRIO),

here

This is not what kernel does (rhbz#1337855).

I think NetworkManager gets that slightly better here, but even that is only correct for the route attributes that NetworkManager knows (it does not take into account attributes that NM does not implement or which will be added in the future).

Anyway, so if libnl3 receives a route (RTM_NEWROUTE), it compares the cache and determines whether the route was added/modified/deleted. "compare the cache" means to see whether a route with the same ID already exists. But the ID operator is not what kernel does.

ip route change is even worse, because that one will replace another route (according to the ID). That is what https://bugzilla.redhat.com/show_bug.cgi?id=1337860 is about.

I link to kernel bugs, but this is of course a libnl3 bug (that should be fixed). However, what kernel provides as API here, is quite difficult to get right.

should not affect the msg_type that we receive (DEL_ROUTE instead of NEW_ROUTE) in some cases.

I think it affects it. If you look at https://github.com/thom311/libnl/blob/7b167ef85f6eb4d7faca349302478b2dc121e309/lib/cache.c#L796 , libnl makes decisions based on how the object gets modified during the cache update. It does not merely forward the event's type->mt_act. This is also necessary, if you consider rhbz#1337860.

thom311 avatar Sep 04 '19 12:09 thom311

We do not understand, if the problem is the ID, why is it that some of the events are correctly processed then.

For example, in the following scenario:

ip route add 2050::/64 nexthop via 2012::2 nexthop via 2012::3
ip route change 2050::/64 via 2012::2
ip route del 2050::/64

Add command correctly sends a NEW_ROUTE event with action NEW and two next hops, so we are expecting the cache to save, internally somehow, the route 2050::/64 -> {2012::2, 2012::3}.

Next command, the change, produces a NEW_ROUTE with CHANGE action. We understand that the prefix (plus some other things) forms the ID (which acts as a key) to check the cache and get the old route. Old object contains the two next hops from the old route (2012::2 and 2012::3), and new object contains the new route 2050::/64 -> 2012::2. If it has properly found that the old route contains to next hops, we expect cache manager to remove those and replace the route with the new next hop. In that case, cache should contain something like 2050::/64 -> 2012::2.

And finally, in the delete command, a DEL_ROUTE with action CHANGE is produced. In this case, using that ID that you provided, the route 2050::/64 -> 2012::2 should be matched so only a remove for 2012::2 should be received.

If the problem is with the ID, why instead of this one remove that we should receive we are receiving 3 removes + 1 add? How did the cache reach that state? The only thing we can think of is that the IPv6 routes with multiple next hops are saved with individual routes, and even then, we don't fully understand how it would affect us only when removing next hops from routes and not when also adding.

lpares12 avatar Sep 05 '19 16:09 lpares12

@thom311 Coming back to this old bug report, which is still an issue for us.

From your last message, your conjecture is that this is somehow a dup of https://bugzilla.redhat.com/show_bug.cgi?id=1337855

However this bug has been closed in favour of https://bugzilla.redhat.com/show_bug.cgi?id=1722728, which seems not accessible for me, so I can't see the status.

Is this fixed by now? Is there any plan to fix this?

If not, could you share some broad ideas we could look into to fix it "the right way", and open a PR?

Thanks

cc @lpares12

msune avatar May 07 '21 00:05 msune