http-parser
http-parser copied to clipboard
Introduce raw data callbacks
These callbacks can be used when developing transparent HTTP traffic analysis and filtering. The main goal of it is to handle transmitted data as-is, but having ability to distinguish data between headers, bodies and different requests.
Thank you for submitting this! May I ask you to write a short example in C to demonstrate how this API could be used?
I just figured out that we need to change position of callbacks in http_parser_settings structure to keep it backward-compatible. I'll update PR and then will do short example.
...also, i guess it's better leave it as is. It's better to get compile-time error, then having mysterious crashes of uninitialized callbacks.
Short example on how it can be used:
std::string hdr;
static int headers_raw(http_parser*parser, const char *at, size_t length)
{
hdr.append(at, length);
}
static int headers_complete(http_parser*parser)
{
do_anything_when_headers_received();
send(fd, hdr.c_str(), hdr.size(), 0);
hdr.clear();
}
static int body_raw(http_parser*parser, const char *at, size_t length)
{
send(fd, at, length, 0);
}
But when developing complex processing, we need keep in mind, that on_header_raw()
callback can be received before on_message_begin()
. It's easy to reproduce - just send alone newline symbol before sending request. It happens because of non-caching nature of HTTP parser.
The numerous style errors alone make this pull request unsuitable for inclusion in its current shape. As to the functionality, I have no strong opinion on whether it's a good addition.
Something to keep in mind is that http_parser is more strict in what it accepts than most browsers. If you're going to use it for MitM-style analysis or filtering, you may have a hard time.
@bnoordhuis, I don't insist, but these additions extend the limits of applicability of the parser. Writing transparent proxy with support of keep-alive and pipelining very difficult (if possible at all) without raw data callbacks. Using raw callbacks it becomes quite easy.
I'll let @indutny make the call on whether or not to accept this pull request.
Writing transparent proxy with support of keep-alive and pipelining very difficult (if possible at all) without raw data callbacks.
Is that really true? It's my experience that transparent proxies still frequently need to rewrite parts of the request and the response.
@bnoordhuis, there are different tasks for proxying. Main goal of raw callbacks is to keep control of request/response sequence without need of reassembling stream before forwarding.
@RandoMan70 I don't really object to this change, but on other hand try to avoid touching that code. It is very sensitive code path, which may ruin the performance of the method because of high register pressure.
What do you think about doing this under some build-time define? Clearly this feature is not something that the most of the users will use, so it will make more sense to do it this way.
Just store that here: https://github.com/OISF/libhtp/pull/123 exactly the same, but in libhtp!
@RandoMan70 can you write me a message ? [email protected]
@indutny it's possible, sure. But don't sure that it can significantly affect performance - checks executed mosly once per call and per request/response switch.
@bnoordhuis Any chance that this pull-request ever will be merged into master? This is exactly what i'm looking for as well.
Not in v2.x since it changes the ABI. Maybe in 3.0 if and when that is released.
(It could be put behind a build-time flag but that's mighty inconvenient for some downstream users. Distros already have a hard enough time with the strict and non-strict builds.)
Any update on adding this pull request, in a release?
There are no concrete plans for a v3.x release so far.
@HAT-DK It isn't mentioned in the readme, but nodejs is moving away from http-parser to llhttp. If you're interested, I could guide you through porting this PR to it.
@asvetlov
Seems, aiohttp should move to this parser too :)
I have create an issue for that: https://github.com/aio-libs/aiohttp/issues/3561