pyroute2 icon indicating copy to clipboard operation
pyroute2 copied to clipboard

implementing support for generic netlink policy dump

Open jacob-keller opened this issue 2 years ago • 14 comments

Recent kernels have gained support for dumping policy information about generic netlink interfaces.

I want to add support for the CTRL_CMD_GETPOLICY to enable dumping such policy information.

Unfortunately, the way this policy information is dumped causes some parsing problems.

Specifically, the structure is something like

nla_nest_start( CTRL_ATTR_OP_POLICY ) -> nla_nest_start ( command_id ) -> nla_u32( CTRL_ATTR_POLICY_DO ) OR nla_u32 ( CTRL_ATTR_POLICY_DUMP ) nla_nest_end nla_nest_end

specifically the tricky part is that the command_id is dependent on which netlink family is being dumped. Its the command op codes for all commands available in that family. I don't think the current nla_map bits would work here, because we essentially need to handle any valid opcode here.

I am not sure if there is a way to get some sort of custom handling for a given attribute so we can specify this as a nested attribute in a different way when parsing the message?

jacob-keller avatar Jul 29 '22 19:07 jacob-keller

One can redefine encode and decode methods of nla child classes.

A simple example: https://github.com/svinota/pyroute2/blob/6a80cd3df22942097ba5ce2622fbeac0bba8150f/pyroute2/netlink/generic/wireguard.py#L193-L200

A more complicated example: https://github.com/svinota/pyroute2/blob/6a80cd3df22942097ba5ce2622fbeac0bba8150f/pyroute2/netlink/nl80211/init.py#L752-L801

In a custom encode/decode methods you can do literally anything with the buffer, but take a look at the existing code, since the API may be a bit weird (hope we turn it into something more reasonable one day)

svinota avatar Jul 29 '22 20:07 svinota

I'm having trouble figuring out how to do something here. I can override the decode function but its not really clear what to put there.

The problem is that what I want is basically to overwrite the nla_map logic and just allow any attribute type to be the nesting attribute. I could hard code it by just making a pre-generated nla_map from 0 up to the maximum attribute size, but that seems like super overkill...

I don't really want to have to duplicate the decode logic, and I don't know how to safely extract the attribute during decoding in a way that doesn't break the rest of the parsing. I also still want the sub-attributes to get decoded nicely and placed in as such.

jacob-keller avatar Jul 29 '22 22:07 jacob-keller

Let me dig a bit deeper into the kernel code on this weekend, let's see what we can do. At least you know what to expect there, so I will ask you here for details in the case of any questions.

I believe we solve that by Tuesday the latest.

svinota avatar Jul 29 '22 22:07 svinota

I did get something sort of hacky working, which I'll make as a PR. I basically just make comprehensions to fill in nla_map. It's pretty ugly, and I think a proper policy dump would want to split the attributes out into more useful data structures.

jacob-keller avatar Jul 29 '22 23:07 jacob-keller

Here's what I managed so far: https://github.com/svinota/pyroute2/pull/981

I think what we really want is some sort of option to basically say "this will have an id-based nested attribute which has subattributes of this policy". This is type of nested attribute is done 3 different times in the policy dumping code including:

  1. to specify which opcodes have which policyt
  2. to specify a policy index, i.e. when a family has multiple policies (one per command typically)
  3. to specify the attribute a policy is for

It would be nice to have a generic facility which you can set some sort of name for the attribute number as well as what sub-class to use for parsing the nested attributes underneath.

jacob-keller avatar Jul 29 '22 23:07 jacob-keller

Got it, thanks a lot for the PR!

Yep, I need to fix the NLA parser, and I believe I know in what way.

svinota avatar Jul 30 '22 10:07 svinota

@jacob-keller here is the plan:

  • [x] merge #981
  • [x] fix code format
  • [x] merge #982
  • [x] fix the code to use the new API

svinota avatar Jul 31 '22 20:07 svinota

Ok, it should be ready now. Take a look when you have a minute.

https://github.com/svinota/pyroute2/blob/6a13ef8dc6086eb43179a1e4c7e0af6d2e55cf36/pyroute2/netlink/init.py#L2458-L2482

svinota avatar Jul 31 '22 22:07 svinota

That's great! I think the only other parsing issue is the ability to extract the value back out when analyzing the policy data later. Perhaps we could also add an attr_id field to the dictionary so that you could directly extract the ID? Come to think of it, that would mean you don't need a unique 'name' for each attribute in that case which would make parsing even easier I think.

jacob-keller avatar Aug 01 '22 17:08 jacob-keller

Actually looked at this closer and I am very confused by the way nla_slot works and how the NLAs get represented.

I would have expected something like a list of attributes, each attribute with a name, type, flags, and value. Then the value itself could either be a raw low-level value, or its own nla.

If the nested attribute properly marks itself as a NLA_F_NEST the parsing could automatically parse the nested attribute, otherwise it would need to know its a nested type from the nla_map.

The current "nla_slot" represents this as a tuple which is very confusing, when it would make more sense to represent it more like a structure with a dictionary of the relevant fields instead?

It also makes printing weird because the nla_slot uses repr which condenses it down to a tuple and makes adding an attribute difficult because we'd have a weird issue with being able to confuse the tuple slots.

I'm not sure what the backwards compatibility issues this would cause to change the format so much though..

jacob-keller avatar Aug 01 '22 18:08 jacob-keller

Historically the parsed structure was like that:

{
  'header': {
    'length': ...,
    'type': ...,
    'flags': ...,
    'sequence_number': ...,
    'pid': ...,
    'error': ...,
  },
  'field1': ...,
  'field2': ...,
  'attrs': [
    ('NLA_NAME', nla),
    ...,
  ],
}

So the simple field can be get with msg['field1'], while NLA:s can be accessed via msg['attrs'][...] or msg.get_attr('NLA_NAME'), or msg.get(('NLA_NAME', 'GRANDCHILD_NLA_NAME, ...))`.

An extremely huge statistics NLA:s in RTNL link messages and a slow parser leaded to a solution, where NLA:s are parsed on demand — only when you try to access them. Thus saving some CPU. That's what nla_slot() does — it defers the NLA parsing.

Since the parser primarily worked with RTNL, so I treated NLA:s as a simple containers with one or more simple format values (a string or a struct of ints), the NLA:s names are well known, so there is no need even to save the NLA:s headers (they are stripped by the parser).

This may be changed, but we should maintain some backwards compatibility:

  1. get_attr() must return an NLA value
  2. get(...) must return a field or an NLA specified by a path ('NLA_NAME', 'NLA_NAME', ...)
  3. __getitem__() must return a simple field value

Basically that's all, but I'm working now to provide a simple API for pluggable parsers — so one could change the existing parser and reuse it, or write and use something own.

svinota avatar Aug 01 '22 19:08 svinota

So any proposals and PR:s are more than just welcome.

svinota avatar Aug 01 '22 19:08 svinota

Makes sense. I'll see if I can get it to work storing additional data in a bit easier to parse format. I think part of it was the way that repr works when simply trying to print the entire structure.

jacob-keller avatar Aug 01 '22 23:08 jacob-keller

@jacob-keller some docs on the parser, that may be relevant to the topic: https://beta.pyroute2.org/parser.html

The code documented is in the master branch, to be released with 0.7.3.

svinota avatar Aug 05 '22 14:08 svinota