libnl icon indicating copy to clipboard operation
libnl copied to clipboard

Duplicated rtnl_link entries in nl_cache

Open nscnd opened this issue 3 years ago • 5 comments

Hi there,

Since 764c30a272b452e423740da60eaf7cce75895953, there seems to be a bug regarding rtnl_links objects in a nl_cache. We now have duplicated entries for a signle ifindex which seems wrong

I can see it in an application that follows this pattern: 1- create a nl_sock 2- create a nl_cache_mngr with nl_cache_mngr_alloc 3- create a link cache with nl_cache_mngr_add & "route/link" 4- monitor changes with our eventloop with nl_cache_mngr_get_fd / nl_cache_mngr_data_ready ... later on in the program after some links added/removed/modified ... 5- enumerate all links from cache using nl_cache_foreach 6- display their index with rtnl_link_get_ifindex

This used to give me the following output

-> link ptr=0x0xffff8d2eb590x,ifindex=1 -> link ptr=0x0xffff8d2eb020x,ifindex=2 -> link ptr=0x0xffff8d2ebaa0x,ifindex=5 -> link ptr=0x0xffff8d2e9550x,ifindex=6 -> link ptr=0x0xffff8d2e8010x,ifindex=7 -> link ptr=0x0xffff8d2e8a90x,ifindex=8 -> link ptr=0x0xffff8dba1aa0x,ifindex=7 -> link ptr=0x0xffff8d2e7570x,ifindex=4 -> link ptr=0x0xffff8d2ea560x,ifindex=3 -> link ptr=0x0xffff8dba1560x,ifindex=9

And now it gives:

-> link ptr=0x0xffffb5d0b590x,ifindex=1 -> link ptr=0x0xffffb5d0b020x,ifindex=2 -> link ptr=0x0xffffb5d0baa0x,ifindex=5 -> link ptr=0x0xffffb5d09550x,ifindex=6 -> link ptr=0x0xffffb5d08010x,ifindex=7 -> link ptr=0x0xffffb5d08a90x,ifindex=8 -> link ptr=0x0xffffb65c1020x,ifindex=7 -> link ptr=0x0xffffb65c1590x,ifindex=4 -> link ptr=0x0xffffb65c1ae0x,ifindex=4 -> link ptr=0x0xffffb5d06570x,ifindex=1 -> link ptr=0x0xffffb5cdbad0x,ifindex=3 -> link ptr=0x0xffffb5d0a560x,ifindex=3 -> link ptr=0x0xffffb5d0a020x,ifindex=9 -> link ptr=0x0xffffb5d0aab0x,ifindex=9

Surely this isn't expected behaviour right ? Thanks

nscnd avatar May 28 '21 16:05 nscnd

Hi there,

So I dug a little deeper on that matter and it seems in fact that there are rtnl_link objects of different families living in the cache at the same time (which for a newcomer is quite unexpected).

So the turning point seems to be 125119aff05e16f150ad881dc9479686495cb5bd Before that commit, no AF_BRIDGE objects ever, since they were :

  • not queried during cache creation
  • discarded if received on the socket nonetheless

After that commit we:

  • parsed AF_BRIDGE messages when received
  • queried them at cache fill when NL_CACHE_AF_ITER was set

On my kernel 5.4 even if I do not set NL_CACHE_AF_ITER explicitly I still get AF_BRIDGE objects:

  • for "bridge masters" I only have AF_BRIDGE rtnl_link objects in the cache
  • for "bridge slaves" I have both AF_BRIDGE & AF_UNSPEC rtnl_link objects in the cache

For me the problem is that we can now have in the cache objects of different families at the same time, with a different set of information in them, and we don't control which one we get when using standard apis like rtnl_link_get/rtnl_link_get_by_name.

Did I miss something ? Is this still expect behavior & desired ? Thanks for any input on this.

nscnd avatar Jun 04 '21 14:06 nscnd

can you write your code

chengyechun avatar Dec 20 '21 11:12 chengyechun

Hi nscnd,

Did you solve that issue? I can't get AF_UNSPEC entries for bridge (and I need them as my bridges have another master interface, which is vrf) even I set NL_CACHE_AF_ITER.

zstas avatar Jan 07 '22 16:01 zstas

Hello there, To be honest, the more I used this lib, the more I discovered weird things, so I switched to hand rolled netlink code for most of my bridge manipulation operations. Sorry for the lack of help guys...

nscnd avatar Jan 11 '22 08:01 nscnd

A solution for the problem (if you still need one) might be looping over the cache with nl_cache_foreach and plucking out the one you want that way or doing whatever operation you need to do in the callback you provide it.

avlec avatar Jul 12 '22 22:07 avlec