libnl lib/route: add initial support for br_vlan global

I am working on adding initial support for a new bridge vlan module for libnl in order to support RTM_{NEW,DEL,GET}VLAN messages. Our initial need is to be able to set bridge vlan global options, which are sent with the BRIDGE_VLANDB_GLOBAL_OPTIONS nested attribute.

I would like to design the initial support such that extending the module to support BRIDGE_VLANDB_ENTRY, GET and DEL requests, and caching is possible.

I am looking for some initial feedback on the best way to approach this from someone who is more familiar designing new modules. One thing I am unsure about is what level the libnl object for the bridge vlan should be defined at. Currently I have it so that there is one object for all bridge vlan info, but maybe there should be one just for the global options and one for the entries. Both share the same message type so I'm unsure if that might introduce challenges in implementing caching.

Oct 01 '24 20:10 ronand-atl

I am looking for some initial feedback on the best way to approach this from someone who is more familiar designing new modules. One thing I am unsure about is what level the libnl object for the bridge vlan should be defined at. Currently I have it so that there is one object for all bridge vlan info, but maybe there should be one just for the global options and one for the entries. Both share the same message type so I'm unsure if that might introduce challenges in implementing caching.

Sorry, I cannot competently comment on this question. It requires to well understand the kernel API, that is, how are those things exposed on the netlink API, so we can find which objects can best represent them.

Overall, the patch seems to be on the right track (I think!). Thank you.

Oct 02 '24 07:10 thom311

FWIW, we have implemented the VLANDB part at https://github.com/bisdn/meta-bisdn-linux/blob/main/recipes-support/libnl/libnl/0008-bridge-vlan-add-per-vlan-stp-state-object-and-cache.patch, but I havent submitted that yet due to the global options part not being implemented.

Maybe we can find a way of combining the effort.

Oct 11 '24 13:10 KanjiMonster

I am looking for some initial feedback on the best way to approach this from someone who is more familiar designing new modules. One thing I am unsure about is what level the libnl object for the bridge vlan should be defined at. Currently I have it so that there is one object for all bridge vlan info, but maybe there should be one just for the global options and one for the entries. Both share the same message type so I'm unsure if that might introduce challenges in implementing caching.

Sorry, I cannot competently comment on this question. It requires to well understand the kernel API, that is, how are those things exposed on the netlink API, so we can find which objects can best represent them.

Overall, the patch seems to be on the right track (I think!). Thank you.

This is a similar "issue" here that I also mentioned at https://github.com/thom311/libnl/issues/390 in that the same netlink group has two very different kind of objects it notifies about.

So for VLANDB there is the global options which exist once per vlan per bridge which has MSTI assignment and various options set for the whole vlan, and then there is are per-bridge member per-vlan options, like the VID, whether it is tagged, pvid, STP state, stats etc.

So in a way it does not make much sense to have both share a cache, since they have no overlap in information.

Oct 11 '24 13:10 KanjiMonster

@KanjiMonster Hey, thanks for reaching out :)

FWIW, we have implemented the VLANDB part at https://github.com/bisdn/meta-bisdn-linux/blob/main/recipes-support/libnl/libnl/0008-bridge-vlan-add-per-vlan-stp-state-object-and-cache.patch, but I havent submitted that yet due to the global options part not being implemented.

Maybe we can find a way of combining the effort.

Nice, luckily we didn't end up duplicating effort. I had a quick look over the patch linked and I noticed the build_bridge_vlan_msg function doesn't seem to do anything? Maybe that part still needs to be implemented.

This is a similar "issue" here that I also mentioned at https://github.com/thom311/libnl/issues/390 in that the same netlink group has two very different kind of objects it notifies about.

So for VLANDB there is the global options which exist once per vlan per bridge which has MSTI assignment and various options set for the whole vlan, and then there is are per-bridge member per-vlan options, like the VID, whether it is tagged, pvid, STP state, stats etc.

So in a way it does not make much sense to have both share a cache, since they have no overlap in information.

The global options seem to be conceptually different from the per vlan per port options, so I'm planning on having them in their own module route/br_vlan/global_opts.c. I'm working on a revised version which I should be able to push soon. If they are different modules then there shouldn't be too much overlap. If we want to implement caching then that looks like it might take extra work, demultiplexing nested attributes in messages sent from the kernel through a common br_vlan module, or having both modules process them and ignore nested attributes it doesn't care about.

If you're interested, we could set up a branch on a forked repo so we can work together on a more complete module that can be upstreamed all at once.

Oct 14 '24 04:10 ronand-atl

@KanjiMonster Hey, thanks for reaching out :)

FWIW, we have implemented the VLANDB part at https://github.com/bisdn/meta-bisdn-linux/blob/main/recipes-support/libnl/libnl/0008-bridge-vlan-add-per-vlan-stp-state-object-and-cache.patch, but I havent submitted that yet due to the global options part not being implemented. Maybe we can find a way of combining the effort.

Nice, luckily we didn't end up duplicating effort. I had a quick look over the patch linked and I noticed the build_bridge_vlan_msg function doesn't seem to do anything? Maybe that part still needs to be implemented.

Right, we are only listening to netlink, not setting anything, so we only cared about getting notified about changes part (and being able to look stuff up).

For context, we are listing to changes in fdb/bridge/routes/etc to write the state/configuration down to a switch ASIC so these things get hardware accelerated in the background. A bit like switchdev, but in userspace.

This is a similar "issue" here that I also mentioned at #390 in that the same netlink group has two very different kind of objects it notifies about. So for VLANDB there is the global options which exist once per vlan per bridge which has MSTI assignment and various options set for the whole vlan, and then there is are per-bridge member per-vlan options, like the VID, whether it is tagged, pvid, STP state, stats etc. So in a way it does not make much sense to have both share a cache, since they have no overlap in information.

The global options seem to be conceptually different from the per vlan per port options, so I'm planning on having them in their own module route/br_vlan/global_opts.c. I'm working on a revised version which I should be able to push soon. If they are different modules then there shouldn't be too much overlap. If we want to implement caching then that looks like it might take extra work, demultiplexing nested attributes in messages sent from the kernel through a common br_vlan module, or having both modules process them and ignore nested attributes it doesn't care about.

But caching is the interesting part in libnl ;-). And the complicated one, since IIRC libnl only allows one module per message type, so we can't have two modules for RTM_NEWVLAN. So we would need to change libnl to allow that, but at the same time it would also benefit the mdb implementation.

And I just remembered my "programming is hard, let's go shopping" moment when I initially looked at implementing it:

The kernel does not send out a initial global opts message when a new vlan group is created (the first time a vlan is assigned to any bridge member), only if any global options are changed it sends out one. So you won't know that a vlan is assigned MSTI 0 unless you explicitly ask, or change any other option.

Likewise, the kernel does not send out a RTM_DELVLAN for global opts if a vlan group is removed (the last member of a vlan is removed). This resets all vlan global opts, but user space won't know unless userspace does reference tracking. Or tries to get the options for the vlan, which will then result in an error.

And lastly, kernel does not send out any messages to userspace about STP state changes for MSTI, or VLAN STP state changes as a result of them.

Maybe I should just send in kernel patches for these instead of trying to come up with a solution for this in userspace ... .

If you're interested, we could set up a branch on a forked repo so we can work together on a more complete module that can be upstreamed all at once.

I can't do any promises on how much time I can spend on it, but apart from that we can do that.

Oct 14 '24 09:10 KanjiMonster

I've pushed a new version of the code which finishes implementing this module. This module should work as is for allowing the user to modify and query global VLAN options of a bridge master interface. The non-global counterpart, if there is interest, would be implemented in a separate file, followed by caching support.

Some more testing will need to be done to ensure everything works as expected before merging.

Oct 30 '24 23:10 ronand-atl

I've pushed a new version of the code which finishes implementing this module. This module should work as is for allowing the user to modify and query global VLAN options of a bridge master interface. The non-global counterpart, if there is interest, would be implemented in a separate file, followed by caching support.

Some more testing will need to be done to ensure everything works as expected before merging.

How does cache update reporting work for those merged objects? Will the callback know what was merged, or will it need to do its own diff of the "old" and "new" objects?

The latter would be rather inefficient, and I would like to avoid that. Ideally there is one object per {vlan,port,bridge}.

Oct 31 '24 10:10 KanjiMonster

How does cache update reporting work for those merged objects? Will the callback know what was merged, or will it need to do its own diff of the "old" and "new" objects?

I'm not 100% sure what you mean. Could you elaborate? Are you talking about the oo_update function? Currently that's written to just handle the response of a dump request, and not update notifications. To handle update notifications it would need to merge the new entries into the existing object (adding or replacing entries). A two pointer strategy could be used do the operation in O(n + m) time without needing to modify the data structure.

Ideally there is one object per {vlan,port,bridge}.

I currently have one object per bridge master interface which has an entry for each vlan. This applies to the global options, but I'd expect the non global options to also have an object per bridge master or port with an entry per vlan.

Oct 31 '24 21:10 ronand-atl

I currently have one object per bridge master interface which has an entry for each vlan. This applies to the global options, but I'd expect the non global options to also have an object per bridge master or port with an entry per vlan.

This is the core issue, you can't have multiple object types in one cache. Either the cache has global vlan objects, or it has vlan entry objects. It cannot have both.

And you cannot have more than one cache per netlink message type. Once you register your global vlan options object type for RTM_{NEW,DEL}VLAN, you cannot add one for vlan entry objects, and vice versa.

My best Idea so far had been to have a "type" attribute for the object, which tells whether it is a global opts or a vlan entry type. So the id attributes become {ifindex,vid,type}, and depending on the attributes in the RTM_{NEW,DEL}VLAN you would update/create/remove different type of objects.

There is no issue in sending multiple object notifications from one message, libnl handles that fine.

Then maybe we also keep a reference of the vid entry in the gopts, as gopts live only as long as there are vlan members, once the last member is removed, the gopts get implicitly reset to defaults (but there is no explicit netlink message for that).

Nov 01 '24 09:11 KanjiMonster

Though in general as a quick glance review for this PR:

I think there should be one gopts object per vlan per bridge. IIUC there are no global gopts that apply to all vlans, but each option is per vlan.
You are not registering for any netlink message types or any netlink message groups, so the message parser is never invoked. You are missing nl_object_ops::oo_msgtypes etc.

Nov 01 '24 10:11 KanjiMonster

Hmm yeah the caching issue is a tricky one. My initial idea with this PR was to implement the functionality without caching, and then add caching later, once it's figured out. I'm also mindful that the initial design must not do anything that could compromise efforts to add caching later.

I'm thinking some of the caching code could be changed to allow offloading the message handling to an intermediary module that reads each attribute in the message and calls the appropriate handler from either the gopts or the non-gopts module. This would solve this problem you mentioned: "And you cannot have more than one cache per netlink message type". I'll need to have a closer look at the code though to see if this is a viable solution.

In response to your review:

I think there should be one gopts object per vlan per bridge. IIUC there are no global gopts that apply to all vlans, but each option is per vlan.

Perhaps, I'm not sure. Is there a benefit to doing that? Wouldn't that make it more annoying to get and update entries for a single interface since now there's potentially many more entries that aren't grouped or ordered in any way?

You are not registering for any netlink message types or any netlink message groups, so the message parser is never invoked. You are missing nl_object_ops::oo_msgtypes etc.

The message parser does get invoked when requesting the gopts entries for a bridge master with rtnl_br_vlan_gopts_get_kernel. As we don't know about the best way to implement caching, there isn't any code that registers a cache yet.

Nov 06 '24 05:11 ronand-atl

looks like a clone of BSD sys/queue.h

Nov 13 '24 18:11 mcr

@mcr Were you referring to this one? https://github.com/freebsd/freebsd-src/blob/main/sys/sys/queue.h The code doesn't look too similar.

Nov 14 '24 05:11 ronand-atl

In response to your review:

I think there should be one gopts object per vlan per bridge. IIUC there are no global gopts that apply to all vlans, but each option is per vlan.

Perhaps, I'm not sure. Is there a benefit to doing that? Wouldn't that make it more annoying to get and update entries for a single interface since now there's potentially many more entries that aren't grouped or ordered in any way?

One benefit is that look ups for specific vlan's options is quicker, since you can make use of the cache (and hashing). And you can make use of libnl filtering (e.g. "give me all vlan opts with msti set to X"). No way to do that if all vlans are part of the same object.

Since these are global options, there will only be one object per bridge, and in most scenarios, there will only be one bridge. So in practice, having all vlans in one object means there is only one object, so the cache becomes meaningless. And if you want to look up any info, you need to iterate through the linked list(s).

So from a consumer perspective having one object per vlan is much easier to work with.

Nov 18 '24 10:11 KanjiMonster

@KanjiMonster I see. I'll see if I can come up with a revised interface for the module then. Hopefully most of my code is reusable.

Nov 20 '24 04:11 ronand-atl

@ronand-atl: Any progress on it?

Apr 08 '25 23:04 Neustradamus

No, unfortunately. It's unlikely I will be able to work more on this. If anyone wants to continue work on this however I would be happy to let that happen.

Apr 09 '25 03:04 ronand-atl

I tried creating a proof of concept implementation with cache (loosely based what on what we are currently doing). The following issues I found while trying to do it:

Vlans on a bridge get created the first time a vlan is assigned to a port (with default global options), and deleted if the last port is removed form the vlan (which implicitly resets the global options).
The kernel only notifies about changed global vlan options. So a new vlan will have unknown options until they get changed (or one explicitly requests an update), and there is no notification of a vlan being deleted from a bridge.
The VLANDB_ENTRY uses the port's ifindex and does not provide any ifindex for the bridge it belongs to. On the other hand, VLANDB_GLOBAL_OPTIONS have only the bridge ifindex, and no ifindex of any ports (since they are global options). So to match VLANDB_ENTRY objects to bridges, one needs a different source, e.g. a link cache.
There is no way of getting both global options and entry options in one dump. You can only get either, so you need to do two dumps.
When doing the initial dump/sync, libnl does not do updating objects, and adds every message as its own object.

Based on these restrictions, I did the following:

I opted for One object per <bridge_ifindex,vlan_id>, with global options in the object (only msti for testing), and a list of ports that are members of the vlan (which hold the per port vlan options).
Use the link cache to find out the bridge ifindex for VLANDB_ENTRY notifications. So the link cache is a hard requirement for the vlandb cache to work.
Using oo_update(), VLANDB_GLOBAL_OPTIONSs will update the global options of a bridge vlan (if it exists).
Using oo_update(), VLANDB_ENTRYs will add/remove ports to the list of ports.
When there is only one port in bridge vlan left, don't update, but instead delete the object (= vlan is removed from the bridge). This way the global options "reset" to defaults.
Have the vlandb entry twice in the group list to make libnl do two dumps (one for global, one for entries). This is stupid, but it works.
Made nl_cache_refill() use nl_cache_pickup_checkdup() to be able to merge the different object types on dump into one entry. No Idea what issues this may cause for other caches, but I don't care for a proof of concept.

My Proof of concept code is at https://github.com/bisdn/libnl/commits/jogo_vlandb_with_gopts/ (and yes, I know that there are a lot of issues in this code, symbols aren't exported yet etc).

Apr 09 '25 10:04 KanjiMonster

lib/route: add initial support for br_vlan global_opts module