pyroute2 implementing support for generic netlink policy dump

Recent kernels have gained support for dumping policy information about generic netlink interfaces.

I want to add support for the CTRL_CMD_GETPOLICY to enable dumping such policy information.

Unfortunately, the way this policy information is dumped causes some parsing problems.

Specifically, the structure is something like

nla_nest_start( CTRL_ATTR_OP_POLICY ) -> nla_nest_start ( command_id ) -> nla_u32( CTRL_ATTR_POLICY_DO ) OR nla_u32 ( CTRL_ATTR_POLICY_DUMP ) nla_nest_end nla_nest_end

specifically the tricky part is that the command_id is dependent on which netlink family is being dumped. Its the command op codes for all commands available in that family. I don't think the current nla_map bits would work here, because we essentially need to handle any valid opcode here.

I am not sure if there is a way to get some sort of custom handling for a given attribute so we can specify this as a nested attribute in a different way when parsing the message?

Jul 29 '22 19:07 jacob-keller

One can redefine encode and decode methods of nla child classes.

A simple example: https://github.com/svinota/pyroute2/blob/6a80cd3df22942097ba5ce2622fbeac0bba8150f/pyroute2/netlink/generic/wireguard.py#L193-L200

A more complicated example: https://github.com/svinota/pyroute2/blob/6a80cd3df22942097ba5ce2622fbeac0bba8150f/pyroute2/netlink/nl80211/init.py#L752-L801

In a custom encode/decode methods you can do literally anything with the buffer, but take a look at the existing code, since the API may be a bit weird (hope we turn it into something more reasonable one day)

Jul 29 '22 20:07 svinota

I'm having trouble figuring out how to do something here. I can override the decode function but its not really clear what to put there.

The problem is that what I want is basically to overwrite the nla_map logic and just allow any attribute type to be the nesting attribute. I could hard code it by just making a pre-generated nla_map from 0 up to the maximum attribute size, but that seems like super overkill...

I don't really want to have to duplicate the decode logic, and I don't know how to safely extract the attribute during decoding in a way that doesn't break the rest of the parsing. I also still want the sub-attributes to get decoded nicely and placed in as such.

Jul 29 '22 22:07 jacob-keller

Let me dig a bit deeper into the kernel code on this weekend, let's see what we can do. At least you know what to expect there, so I will ask you here for details in the case of any questions.

I believe we solve that by Tuesday the latest.

Jul 29 '22 22:07 svinota

I did get something sort of hacky working, which I'll make as a PR. I basically just make comprehensions to fill in nla_map. It's pretty ugly, and I think a proper policy dump would want to split the attributes out into more useful data structures.

Jul 29 '22 23:07 jacob-keller

Here's what I managed so far: https://github.com/svinota/pyroute2/pull/981

I think what we really want is some sort of option to basically say "this will have an id-based nested attribute which has subattributes of this policy". This is type of nested attribute is done 3 different times in the policy dumping code including:

to specify which opcodes have which policyt
to specify a policy index, i.e. when a family has multiple policies (one per command typically)
to specify the attribute a policy is for

It would be nice to have a generic facility which you can set some sort of name for the attribute number as well as what sub-class to use for parsing the nested attributes underneath.

Jul 29 '22 23:07 jacob-keller

Got it, thanks a lot for the PR!

Yep, I need to fix the NLA parser, and I believe I know in what way.

Jul 30 '22 10:07 svinota

@jacob-keller here is the plan:

[x] merge #981
[x] fix code format
[x] merge #982
[x] fix the code to use the new API

Jul 31 '22 20:07 svinota

Ok, it should be ready now. Take a look when you have a minute.

https://github.com/svinota/pyroute2/blob/6a13ef8dc6086eb43179a1e4c7e0af6d2e55cf36/pyroute2/netlink/init.py#L2458-L2482

Jul 31 '22 22:07 svinota

That's great! I think the only other parsing issue is the ability to extract the value back out when analyzing the policy data later. Perhaps we could also add an attr_id field to the dictionary so that you could directly extract the ID? Come to think of it, that would mean you don't need a unique 'name' for each attribute in that case which would make parsing even easier I think.

Aug 01 '22 17:08 jacob-keller

Actually looked at this closer and I am very confused by the way nla_slot works and how the NLAs get represented.

I would have expected something like a list of attributes, each attribute with a name, type, flags, and value. Then the value itself could either be a raw low-level value, or its own nla.

If the nested attribute properly marks itself as a NLA_F_NEST the parsing could automatically parse the nested attribute, otherwise it would need to know its a nested type from the nla_map.

The current "nla_slot" represents this as a tuple which is very confusing, when it would make more sense to represent it more like a structure with a dictionary of the relevant fields instead?

It also makes printing weird because the nla_slot uses repr which condenses it down to a tuple and makes adding an attribute difficult because we'd have a weird issue with being able to confuse the tuple slots.

I'm not sure what the backwards compatibility issues this would cause to change the format so much though..

Aug 01 '22 18:08 jacob-keller

Historically the parsed structure was like that:

{
  'header': {
    'length': ...,
    'type': ...,
    'flags': ...,
    'sequence_number': ...,
    'pid': ...,
    'error': ...,
  },
  'field1': ...,
  'field2': ...,
  'attrs': [
    ('NLA_NAME', nla),
    ...,
  ],
}

So the simple field can be get with msg['field1'], while NLA:s can be accessed via msg['attrs'][...] or msg.get_attr('NLA_NAME'), or msg.get(('NLA_NAME', 'GRANDCHILD_NLA_NAME, ...))`.

An extremely huge statistics NLA:s in RTNL link messages and a slow parser leaded to a solution, where NLA:s are parsed on demand — only when you try to access them. Thus saving some CPU. That's what nla_slot() does — it defers the NLA parsing.

Since the parser primarily worked with RTNL, so I treated NLA:s as a simple containers with one or more simple format values (a string or a struct of ints), the NLA:s names are well known, so there is no need even to save the NLA:s headers (they are stripped by the parser).

…

This may be changed, but we should maintain some backwards compatibility:

get_attr() must return an NLA value
get(...) must return a field or an NLA specified by a path ('NLA_NAME', 'NLA_NAME', ...)
__getitem__() must return a simple field value

Basically that's all, but I'm working now to provide a simple API for pluggable parsers — so one could change the existing parser and reuse it, or write and use something own.

Aug 01 '22 19:08 svinota

So any proposals and PR:s are more than just welcome.

Aug 01 '22 19:08 svinota

Makes sense. I'll see if I can get it to work storing additional data in a bit easier to parse format. I think part of it was the way that repr works when simply trying to print the entire structure.

Aug 01 '22 23:08 jacob-keller

@jacob-keller some docs on the parser, that may be relevant to the topic: https://beta.pyroute2.org/parser.html

The code documented is in the master branch, to be released with 0.7.3.

Aug 05 '22 14:08 svinota

pyroute2 pyroute2 copied to clipboard

implementing support for generic netlink policy dump

pyroute2
pyroute2 copied to clipboard