pyroute2
pyroute2 copied to clipboard
How to allow NDB to listen on specific netlink groups
We use pyroute2
on software routers with millions of routes. It seems NDB loads a full snapshot of the namespace in its database, which in our case takes a significant time.
Is there a way to control what netlink
groups NDB binds to?
If not, we would be willing to contribute changes to allow choosing which netlink
groups NDB listens on. I am thinking of a similar change to the one we contributed back in 2016 to support the same functionality on IPDB: https://github.com/svinota/pyroute2/pull/314
Looking at the code, I thought of two ways to achieve this:
-
Allow users to pass initialized
IPRoute
orNetNS
as NDB sources. The user would be free to bind theseIPRoute
andNetNS
instances to specificnetlink
groups as desired. (As of right now, it seems that only dictionaries specifying sources are accepted;IPRoute
andNetNS
are instantiated afterwards.) This is more-or-less what the user interface would look like:netns = NetNS('mynamespace') netns.bind(groups=desired_netlink_groups) src = { 'kind': 'netns', 'instance': netns } with NDB(sources=[src]) as ndb: ...
-
Allow users to pass desired
netlink
groups as part of the NDB source specification. Then, the groups could be passed tonetlink
when binding the socket. This is what the user interface would look like:desired_netlink_groups = RTMGRP_DEFAULTS & (~RTMGRP_IPV4_ROUTE) & (~RTMGRP_IPV6_ROUTE) src = { 'netns': 'mynamespace', 'netlink_groups': desired_netlink_groups } with NDB(sources=[src]) as ndb: ...
Any preference or feedback on any of these approaches? (It seems to me 2 is easier to use and easier to implement!)
Would listening on a subset of groups violate NDB's assumptions? For example, if one configures NDB to not listen to route messages (like in the example above), then we could understandably raise an exception if the user accesses ndb.routes
, but it's unclear to me whether NDB requires listening to all groups (in particular, it would need to support maintaining a partial database).
The reason why NDB accepts a parameters dict instead of a netlink socket instance is that NDB tries to restart sources in the case of failures. NDB stores the parameters in the DB and instantiates the source from scratch in the case of the source error.
TBH I'm not sure it's not an architectural mistake. The reason behind the current implementation is that one instance can have many sources, including remote ones, it may be much cheaper to restart one source instead of the whole instance. If you have an opinion — pls share.
Given the current implementation, the second approach from the proposed looks much easier to implement at the moment, and I can include it in the comint release next week.
Keeping in mind that I'm still not sure that NDB uses a proper approach to manage the event sources.
I haven't thought of multiple sources or restarting of sources, but these seem like more reasons to go for approach 2. (I can also see the motivation why NDB may want to manage the sources directly, so can't fault the current architecture.) It also seems that any changes to the management of event sources could carry over the functionality to choose netlink
groups.
I'll try to get a PR ready in the next few days.
You'll meet two weird things that will be refactored later.
- A builtin messaging protocol to interconnect NDB instances — is going to be dropped and replaced with 0mq or like that. That code may be safely ignored for now, like
pyroute2.ndb.source.SourceProxy
orpyroute2.ndb.main.NDB.__mm__()
. - The way in which the DB management thread communicates with other parts of the system in general, and with the sources in particular. Basically it is a queue-based pipeline, and I plan to keep this architecture, but it will be rewritten in order to drop as many locks as possible. The issue is how to rewrite it without serious performance penalties; one attempt using
concurrent.futures.ThreadPoolExecutor
already failed due to poor performance in some cases.
@cunha a draft fix is pushed: https://github.com/svinota/pyroute2/pull/997
More tests and some docs will follow tomorrow. The implementation as you proposed:
ndb = NDB(sources=[{..., 'netlink_groups': ...}])
https://github.com/svinota/pyroute2/blob/b550a71a8227198df9509be10bd61511f7c521b0/tests/test_linux/test_ndb/test_init.py#L15-L19