flatbuffers icon indicating copy to clipboard operation
flatbuffers copied to clipboard

[WIP] Feature: Wireshark Dissector Generator

Open jtdavis777 opened this issue 7 months ago • 3 comments

Hello! This is an awesome project and I'm eager to share some work I think would greatly benefit the flatbuffers community.

Description

This PR is my attempt to create a full-featured Wireshark dissector generator (see #153 and #8333).

This implementation is based on bfbs_gen_lua.h|cpp and relies only on reflection to retrieve all the necessary data. One thing I attempted to do is put all of the actual lua logic into supporting files in the wireshark/ directory - it was much easier to iterate on the code in this way. Almost all generated code boils down to "call this generic function with these specific parameters".

The reason why I built this wholly separate from the existing lua generator, is because the existing generator hides a lot of the metadata behind hard-coded locals, and I think the output of this PR could be used as a great hands-on teaching tool for flatbuffers internals for those of us curious (or masochistic) enough to dive into the details.

File naming convention is a little weird for generated/supporting files as Wireshark itself has some odd file loading quirks (see wireshark/README.md

Output currently has two modes, regular and verbose. Regular is intended to just shows you the data you want to see:

image

Verbose is intended to tell you what every single byte (except for padding) in a buffer is used for:

image

*(both of these screenshots are of auto generated data full of gibberish - no, you aren't supposed to be seeing "real" strings)

This PR is still a WIP and I intend to edit this description down to just the relevant information once I handle the last few features/issues. I wanted to get this up now to collect feedback and get some questions answered.

Still TODO:

  • [x] Generate standalone dissector objects for root types (so you can point Wireshark at your proto and dissect without writing any lua, if your packet is pure flatbuffer data)
  • [ ] Support multiple root_type objects in the plugins directory
  • [ ] Support vector of union
    • [ ] Seems blocked by bfbs generation - reflection of _type is still UType not vector of UType
  • [ ] Support displaying non empty default strings and default vectors
    • [ ] Also seems blocked by bfbs generation - non scalar defaults not populated in reflection
  • [ ] Support proper display of bit_flags
    • [ ] blocked by attributes not being sent to reflection - could workaround this by requiring --bfbs-builtins
  • [ ] support automatically parsing nested_flatbuffer
    • [ ] blocked by attributes not being sent to reflection - could workaround this by requiring --bfbs-builtins
  • [ ] Verify/fix windows paths
  • [ ] Support Wireshark 3 - requires require statements
  • [ ] Add a healthy dose of comments
  • [ ] Squash everything into a single commit
  • [ ] Handle (relevant) attributes (if possible?)
  • [ ] Testing? Unsure what automated testing would look like for this feature. I have worked to manually verify every feature I could. I'm sure I've missed plenty.
  • [ ] Documentation

Open Questions

  1. Is there a canonical way to handle paths between operating systems in flatbuffers? I am trying to use full paths as namespacing for generated files. I might have missed something simple here
  2. Am I reinventing the wheel anywhere? Am I doing anything bone headed that is super trivial to do with tools I missed?
  3. What would be required documentation to add to get this feature merged?
  4. Does anyone have any better ideas for file naming conventions? I went with a dumb, straight forward approach, but it doesn't exactly jibe with the rest of the repository.

Possible Future Work

  1. built-in gRPC support. This might be as simple as either doing nothing or adding some simple extra code to the generated root types, but I'm not familiar enough with gRPC to integrate this right now. I think nothing stops one from writing a custom lua dissector that calls both the gRPC built in dissector then one of these generated ones.
  2. flexbuffer support. No clue what this would look like, as I have no experience with flexbuffers, and would also likely need some attribute visibility modifications to be able to act on them.
  3. allow users to specify if they want speicific number fields displayed in other formats (hex or binary)
  4. The current architecture does not support filtering vectors/arrays of scalars in Wireshark's filter bar as these do not have their own specific protocol field types. (i.e. you can't say "show me packets which have a value of 5 in this array"). This may be a minor update I can tackle in the initial release.

jtdavis777 avatar Apr 10 '25 02:04 jtdavis777

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

google-cla[bot] avatar Apr 10 '25 02:04 google-cla[bot]

Found a small bug when the payload doesn't start at the beginning of the buffer, and another small issue with parse_struct. Additionally, loading is a little weird on Wireshark < 4.0. Will address these later today.

jtdavis777 avatar Apr 13 '25 19:04 jtdavis777

Currently tracking a limitation where you can only have one dissector present in the plugins directory at a time. Wireshark doesn't like you sharing common ProtoFields apparently.

jtdavis777 avatar Apr 14 '25 22:04 jtdavis777

hey @dbaileychess or @aardappel -- would either of you be willing to provide some insight on the open questions on my PR? :) I would really appreciate the feedback as this addition has been very useful and I would like to continue improving it.

jtdavis777 avatar Jul 24 '25 01:07 jtdavis777

I have no experience with Wireshark, but this seems cool functionality to have.

The reason why I built this wholly separate from the existing lua generator, is because the existing generator hides a lot of the metadata behind hard-coded locals, and I think the output of this PR could be used as a great hands-on teaching tool for flatbuffers internals for those of us curious (or masochistic) enough to dive into the details.

I think that is a high cost decision, both in terms of having a lot of duplicated/unshared code with the main Lua implementation, and also that using reflection (esp in a language like Lua) instead of hardcoded accessors is going to make things reaaally slow.

That to me would seem to outweight the benefits.

We generally are wary of adding a lot of new code, especially duplicated code, to the project because at some point someone who is not the original contributor will end up paying for the maintenance cost. Now Lua changes likely have to be made in 2 places. For example, at some point we ditched our JavaScript generator in favor of compiling thru the Typescript generator because of maintenance cost.

As such, all of this would be a lot more palatable if it was instead an option to the Lua generator that keeps as much shared as possible.

Is there a canonical way to handle paths between operating systems in flatbuffers? I am trying to use full paths as namespacing for generated files. I might have missed something simple here

I think there are some functions in util.cpp/.h that we use for this purpose thruout the codebase.

Am I reinventing the wheel anywhere? Am I doing anything bone headed that is super trivial to do with tools I missed?

Again not familiar with Wireshark and how this would be used, but see the above.

What would be required documentation to add to get this feature merged?

A similar single .md files to other implementations would be sufficient. Better yet, a section in the Lua docs as per the above.

Does anyone have any better ideas for file naming conventions? I went with a dumb, straight forward approach, but it doesn't exactly jibe with the rest of the repository.

Not sure :)

aardappel avatar Jul 25 '25 23:07 aardappel

@aardappel thank you for the thorough response!

I think that is a high cost decision, both in terms of having a lot of duplicated/unshared code with the main Lua implementation, and also that using reflection (esp in a language like Lua) instead of hardcoded accessors is going to make things reaaally slow.

I certainly agree that this feature adds a lot of cost -- before I continue I should clarify -- when I mentioned relies only on reflection to retrieve all the necessary data I was referring to the generator itself -- I implemented my generator as a bfbs gen instead of an idl gen. The generated code is much like the original Lua implementation where offsets are hard coded and the code is quite performant. I do not perform any runtime reflection for parsing buffers. I will update the PR description to be more clear on this point.

I would love to use or extend the current lua generator to accomplish this task.. unfortunately I think that wireshark's.... "creative" lua integrations make this not feasible (especially for the level of information I want to expose to wireshark).

I realized (belatedly) that this PR implements something much closer to the flatc --annotate function than actual code gen for creation and parsing, if that helps thinking about this PR. :)

jtdavis777 avatar Aug 08 '25 12:08 jtdavis777

Well, it would be feasible to integrate the two, the question is if that would be better or worse. If the integrated version is just a ton of if-then without much shared code, then indeed a separate implementation like your current one is better.

That is hard for me to judge though.. I'd like some other opinions on this PR. Would be great to hear from @dbaileychess and others with Lua or codegen experience.

aardappel avatar Aug 08 '25 17:08 aardappel