Some API suggestions
Hello
I was trying to use peel-ip for something and hit some setbacks. I was thinking a bit about how to fix them, and identified few areas in the API that is actually in the „base“ peel crate. So I'd like to bring them up for discussion, if my ideas make sense.
In general, the idea behind the crate is I think great and it allows powerful parsing. I would like to help out with it, but first I want to know if it makes sense. So, what is your opinion on them? Also, the project seems a bit inactive, do you consider it just dormant for a while, or is it abandoned?
Documentation
There are examples how to build the parser or how to parse something, but digging through the documentation for a way to actually get the results afterwards was a bit challenging. It would be great if the examples included extracting the results and didn't stop by parsing the thing.
Uncomfortable downcasting
When I parse something, get the results and then want to look into what I got, I get bunch of Any things. So I have to go through them, guess what type they might be and try downcasting. This is a lot of boilerplate.
I see why it is there and how it is needed for flexibility, but there could be something done about it:
- Adding a type hint ‒ the
new_parserwould take the parser and some other value. The value type would be generic (but the same for the whole parser tree). The hint would be then returned with the boxedAnyand the caller could match on the hint and dispatch to other functions with some more comfort. - The
Box<Any>could be a template parameter, so if there was a fixed set of parser results, the user could use an enum, or some other trait object that actually has some useful methods. Of course, such parser tree could take only parsers that return such type of results (or something convertible to it withinto).
Need to know the parents
This is best described by the problem I had. I started capturing packets with the pcap crate. When I captured on a real interface, everything was good. But then I used tun0 and discovered there are no ethernet frames and the packets start with the IP header. So I built a parser tree manually which started by a fake parser that did nothing (so I had a single root), placed IPv4 and IPv6 parsers below that. And they didn't work because IPv4 looked what the previous result was and expected either another IPv4 or ethernet.
I solved the problem by faking the ethernet result so the IPv4 parser was happy. But this is wrong, both because it is a lie and because IPv4 can sit on many many more things than just ethernet (to list some, it can be in IPv6 which is not handled).
The reason it looks into the previous result is because it wants to know if it should be parsing itself or if something else is there, according to some field in the previous layer.
So I was thinking, instead of looking into (possibly unknown) previous result type, why not add some kinds of „next expected thing“ hints (also generic). Then I wouldn't have to fake the ethernet header, but could simply provide the hint of type expected by the IPv4 parser. This would also allow splitting the peel-ip into multiple crates (when I'm interested in IP packets or TCP packets, I don't really need to have all http parsing compiled in).
What I didn't think through yet is if it is possible to have different hints on different edges (matching, but different ones), like the hint passed from L2 to IPv4/IPv6 being different than the hint passed from IPv4 to TCP.
Sorry for such a long text, but I believe Rust should have an easy to use and somewhat complete infrastructure to decoding packets and protocols and this is closest to the goal I think.
Trank you very much dir the Feedback. I will have a deeper Look into it later on. 👍
Also, the project seems a bit inactive, do you consider it just dormant for a while, or is it abandoned?
No, the project is not abadoned but I am currently not actively developing on it. This does not mean that I do not accept pull requests or any kind of contribution. :) if there is interest in further developing I will for sure contribute too.
There are examples how to build the parser or how to parse something, but digging through the documentation for a way to actually get the results afterwards was a bit challenging. It would be great if the examples included extracting the results and didn't stop by parsing the thing.
Sure, updating the documentation for this purpose clearly makes sense.
Adding a type hint ‒ the new_parser would take the parser and some other value. The value type would be generic (but the same for the whole parser tree). The hint would be then returned with the boxed Any and the caller could match on the hint and dispatch to other functions with some more comfort.
Sounds interesting, especially in performance comparison to "hint" vs. dynamic cast check. It could be optional which would make sense.
The Box<Any> could be a template parameter, so if there was a fixed set of parser results, the user could use an enum, or some other trait object that actually has some useful methods. Of course, such parser tree could take only parsers that return such type of results (or something convertible to it with into).
In general this sounds reasonable to me. 👍
So I was thinking, instead of looking into (possibly unknown) previous result type, why not add some kinds of „next expected thing“ hints (also generic). Then I wouldn't have to fake the ethernet header, but could simply provide the hint of type expected by the IPv4 parser. This would also allow splitting the peel-ip into multiple crates (when I'm interested in IP packets or TCP packets, I don't really need to have all http parsing compiled in).
The idea of splitting up the crates sounds good. It could also be possible to setup the overall parser via an (maybe external) configuration step, which sets up the possible paths with their hints. So it would be possible to create a very custom e.g. http parser for just some use cases.
In general thank you again for the feedback. We can put all these things in issues and start working on it if you want.
Thanks for the reply.
I'm afraid the ideas are a bit vague yet to create concrete tasks (maybe with the exception of documentation). How about letting this open for a while and I'll try to come up with some more concrete ideas how the API could look like?
Oh sure, you can also play around with the source and propose unfinished „WIP“ pull requests as a discussion base. :)
Some more thinking:
- Each parser returns a pair, the parsed data (that is converted by
.into()into whatever the type of the real result is ‒ eitherBox<Any>or some other type chosen by the user), and a next hint of some type. - When adding the parser, the handle (or was it called index?) is parametric by that type.
- When adding an edge, it can be added with pair handle→handle, then the second parser is tried unconditionally, or with a tripple handle→handle+hint value. The hint is not passed to the second parser, but the edge is active only if the previous parser return that given value.
That way different lower parsers can return different values or even types without the parsers/layers on top of them having to care at all. This effectively moves the check outside from the second parser into the tree structure.
If that makes sense, the next step is for me to find some time to actually write it O:-).
Hm yeah let’s see how the Compiler restricts such implementation. If not, it should be worth adding a test implementation.