rtnl_route_alloc_cache() can't get all kernel route table, some filtered by same hash key
using command ip -6 route
fe80::/64 dev eth3 proto kernel metric 256
fe80::/64 dev eth4 proto kernel metric 256
fe80::/64 dev eth5 proto kernel metric 256
fe80::/64 dev br-lan proto kernel metric 256
but use rtnl_route_alloc_cache() only the first one is fetched.
I have debug it.
because route_obj_ops has
.oo_keygen = route_keygen
so cache will use hashtable, __cache_add() will insert it to the hash table, the above route entries has the same hash keys. but the route_compare() routine only compare following attrs:
.oo_id_attrs = (ROUTE_ATTR_FAMILY | ROUTE_ATTR_TOS | ROUTE_ATTR_TABLE | ROUTE_ATTR_DST | ROUTE_ATTR_PRIO),
next hop is not compared. So above 4 routes are treat same. Is this design or a bug?
@thom311
Yes, that is a known issue. I wouldn't call it "design" though.
See also:
https://bugzilla.redhat.com/show_bug.cgi?id=1337860 https://bugzilla.redhat.com/show_bug.cgi?id=1337855
Roopa Prabhu sent a patches to the mailing list in November 2015. But for some reason, the mailing list rejected the emails and I dropped the ball on them...
For what it's worth, I think NetworkManager handles this correctly. See
On top of that, NetworkManager's route cache is a hash-table, that also links all elements (to preserve their order). So, when there is a "replace" netlink event, care is taken to replace the right route (code). Unfortunately, that is all non-trivial.
@thom311 @yuanjianpeng This is kinda old but I hit the same thing. Isn't the hash function only used for cache lookup? The bugzilla you've sent are related to kernel I suppose.
here's the stack call: nl_cache_find -> __cache_fast_lookup -> nl_hash_table_lookup -> nl_object_keygen
and here we get to this nl_object_keygen function, which for rtnl_route will use route_keygen function. And in that very function parameters such as routing protocol (i.e. RIP, OSFP, etc.), device name, ifndex are not used. This leads to a collision, making the hash lookup useless. Do you know how we can address this issue? Maybe make the user decide which route params are taken into account when creating the key?
Do you know how we can address this issue? Maybe make the user decide which route params are taken into account when creating the key?
By implementing the caching functions (equality and hash) correctly, according to what kernel considers -- so that "different" routes don't get combined when put into the cache.
I think a good start is comparing to what NetworkManager does, which gets this "mostly" right (here).
Unfortunately, it's not easy. Efforts for improving this are welcome. They also should be covered by unit tests. That makes it high effort.
Unit tests are not a problem. As for the equality function, do you know if the current one is not enough? https://github.com/thom311/libnl/blob/main/lib/route/route_obj.c#L395
As for the keygen, https://github.com/thom311/libnl/blob/main/lib/route/route_obj.c#L335 Would it be enough to extend this route_hash_key struct? But I'm not sure how hashing should done for nexthops.
One more question, about projects API/ABI. This change would have a big impact on route lookup behavior, are we ok with that? Or do we need another API?
Unit tests are not a problem. As for the equality function, do you know if the current one is not enough? https://github.com/thom311/libnl/blob/main/lib/route/route_obj.c#L395
That's just the compare function. It matters that the caller passes the right uint64_t attrs argument. Search for oo_id_attrs (which is an inadequate list of attributes).
As for the keygen, https://github.com/thom311/libnl/blob/main/lib/route/route_obj.c#L335 Would it be enough to extend this route_hash_key struct? But I'm not sure how hashing should done for nexthops.
In any case, the hashing should agree with the oo_id_attrs and be about the identifying attributes. I guess, theoretically, it also may only be a subset of those, but optimally the oo_id_attrs and the hash function fully agree.
One more question, about projects API/ABI.
Breaking API/ABI is best avoided. Here, the ABI/API wouldn't change.
Well, the behavior would certainly change, in a notable way (you may call that an API break). But in this case, I think it qualifies as a necessary bugfix. I think it's desirable to change behavior for the current cache implementation (if done correctly).
there is also the problem, that sometimes the netlink events are not "complete". Or better, they have to be inferred.
Having a cache means that routes have a clear identifier (e.g. a set of attributes). You cannot configure two routes in kernel, which have the same values in these attributes. For example, the ifindex is part of the identifying attributes. If you do ip route replace, then kernel may drop another route ("other" according to the route identifier), and send a RTM_NEWROUTE message for the new route. Granted, the message has a NLM_F_REPLACE flag, so you can tell something was replaced. But then you are supposed to search the cache for the route that was replaced. That is hard (it also means, the routes must be kept sorted in the order how they were dumped, because the first route in that order was replaced).
Otherwise, an ip route replace will leave zombie entries that are not actually there anymore.
I faced the same issue with some IPv4 routes.
I clarify with an example.
This is the output of ip route list:
default dev eth0 proto static
192.168.0.0/24 dev eth0 proto kernel scope link src 192.168.0.120
192.168.2.0/24 dev eth1 proto kernel scope link src 192.168.1.120
192.168.3.0/24 dev eth2 proto kernel scope link src 192.168.2.120
239.30.1.100 dev eth2 proto static
239.30.1.100 dev eth1 proto static
And this what I get on the same machine with nl-route-list -t 254:
inet default table main type unicast via dev eth0
inet 192.168.0.0/24 table main type unicast via dev eth0
inet 192.168.2.0/24 table main type unicast via dev eth1
inet 192.168.3.0/24 table main type unicast via dev eth2
inet 239.30.1.100 table main type unicast via dev eth2
You can notice that one route to 239.30.1.100 is missing.
Since both routes to 239.30.1.100 are hashed using the same data, only one of them is added to the hash table ending up with a mismatch between the actual situation (the kernel), correctly displayed by iproute2, and the one reported by libnl (via nl-route-list).
I solved patching route_obj.c, extending the data on which the hash value is computed appending the bytes corresponding to the interface index of all the next hops included into the route. This should not break any compatibility, but ensure that all the routes returned by the kernel that differs only for the 'via/gateway' data are stored and returned correctly. You can see the code I added in the attached file.
0001-route-hashing-with-nexthop.tar.gz
I am aware that this is probably not a solution that covers all possible cases (in terms of different routes sharing the same key data), but I think it can provide a starting point and/or insights for an "official" implementation.