python-netflow-v9-softflowd
python-netflow-v9-softflowd copied to clipboard
Added handling for cases where padding is included in IPFIX Data Sets.
Hi,
I have added handling for padding that exporter are allowed to include in IPFIX Data Sets as per RFC7011. Stumbled upon the problem when exporter from a Cisco equipment sent some data sets with padding in it. Would be glad if you could approve this PR! :)
Hi @TK-Khaw, thank you so much for your contribution! I'll test it the next days and proceed with the merge after validation. It'll probably release a new package minor version
Thanks :)
After reviewing the changes we might need to add some more. The change from self._templates[template_record.template_id] = template_record.fields to self._templates[template_record.template_id] = template_record breaks the tests. I guess you changed this line to later call no_padding_last_offset = self.header.length - template.get_length()?
Padding was already on the TODO list in IPFIXOptionsTemplateRecord... just forgotten 😅 See https://github.com/bitkeks/python-netflow-v9-softflowd/blob/05c23ce9a97554ee298bac1caa49550397334c52/netflow/ipfix.py#L700
Maybe we can insert your fix there?
Hi, I've just read back at my code and realized that there might still be bug in the current commit, as I have used the "length" of the template record, not the actual length of each data record.
I can probably include the actual length of the expected data record in the TODO section you mentioned as the member of IPFIXOptionsTemplateRecord or IPFIXTemplateRecord as data_length for instance. However, we would still need to be able to access the member of the template record, which means we will still need to store template_record in self._templates instead of template_record.field.
The reason for this is because we need the expected length of the data record in order to determine the length of padding introduced in the set.
Another way I could think of to achieve the same effect while storing template_record.fields instead of template_record would be to calculate the length of data record expected from the fields themselves right before we are parsing the data record.
For instance, like this
template = templates.get(self.header.set_id) # This is template_record.fields from before.
data_length = 0
for field in template:
date_length += field.length
no_padding_last_offset = self.header.length - data_length
while no_padding_last_offset > offset:
...
So do we do this instead?
Commit 3c4f8e6 includes padding handling in IPFIX sets. What do you think?
The # TODO: if padding is needed, implement here inside the records was actually a misunderstanding of the spec. Only the sets contain padding, which can be checked by checking for 0x0 bytes. The way it's now implemented should resolve using template_record.
Hi, I had introduced changes I mentioned previously into https://github.com/bitkeks/python-netflow-v9-softflowd/pull/34/commits/28012e3d0484db517b5fab4908e2f34ba088c574.
This code will now leave template_record as it is, whilst achieving consideration for padding as well.
I see that you have introduced padding checking through checking remaining bytes in the set. While I believe it is ultimately a solution as well as it conforms to the spec, I am of a bit of concern that since https://github.com/bitkeks/python-netflow-v9-softflowd/commit/3c4f8e62892876d4a2d42288843890b97244df55 is basically performing summation of all the remaining bytes at every next record. It seems unnecessary to do so since we can already infer the last possible offset if the current set is without padding once before parsing all the records in the set, and then basically just performing a check of current offset against the last possible offset without padding every next iteration, just like what we are trying to introduce with this PR.
Maybe I'm not that good with words, hopefully the picture below might help me illustrate my point. 😅
| ---1st--- | ---2nd--- | ... | <---last--- | <-- hypothetical starting offset of the last records if the set do not have padding.
| ---1st--- | ---2nd--- | ... | ---last--- | 000 | <-- encountered set with padding at the end
^ ^
no_padding_last_offset starting offset in last iteration.
Note that in RFC7011 it is specifically mentioned that the padding length MUST be shorter than any allowable record in the set.
Hi, finally got to it.
Cherry-picked your change, thanks! The calculation in 28012e3 was missing one factor: if there are multiple records in the set, then the dataset_length must be added multiple times to no_padding_last_offset. It's the same for template sets and options template sets (tried an easier calculation with if self.header.length - offset <= 8 and there).
See commit 51ce4eaa268e4bda5be89e1d430477d12fc8a72c
Your solution for template and option template sets looks good to me!
I see you are computing for the offset where the padding starts in a data set in https://github.com/bitkeks/python-netflow-v9-softflowd/commit/51ce4eaa268e4bda5be89e1d430477d12fc8a72c. I think you misunderstood my intention for the calculation of no_padding_last_offset in https://github.com/bitkeks/python-netflow-v9-softflowd/commit/28012e3d0484db517b5fab4908e2f34ba088c574. I understand that there will be multiple records in a set, but I never intended to compute for the offset where the padding starts in the first place.
What I intended to do in https://github.com/bitkeks/python-netflow-v9-softflowd/commit/28012e3d0484db517b5fab4908e2f34ba088c574 however, was to see if there is still enough bytes left for us to parse a data record. Thus, as long as offset is larger than no_padding_last_offset, we can be confident that no more data record can be "squeezed" out from the remaining bytes and thus constitute a sufficient condition to quit the loop.
Maybe the misunderstanding arose due to a misnomer on my part.
Closing this PR because your changes were included in the squash merge bb0ab89615d25fe0fe7853c7dfc92d9a42b72131. Thanks 🙂