http-parser icon indicating copy to clipboard operation
http-parser copied to clipboard

Introduce raw data callbacks

Open RandoMan70 opened this issue 8 years ago • 17 comments

These callbacks can be used when developing transparent HTTP traffic analysis and filtering. The main goal of it is to handle transmitted data as-is, but having ability to distinguish data between headers, bodies and different requests.

RandoMan70 avatar Feb 20 '16 06:02 RandoMan70

Thank you for submitting this! May I ask you to write a short example in C to demonstrate how this API could be used?

indutny avatar Feb 20 '16 06:02 indutny

I just figured out that we need to change position of callbacks in http_parser_settings structure to keep it backward-compatible. I'll update PR and then will do short example.

...also, i guess it's better leave it as is. It's better to get compile-time error, then having mysterious crashes of uninitialized callbacks.

RandoMan70 avatar Feb 20 '16 06:02 RandoMan70

Short example on how it can be used:

std::string hdr;

static int headers_raw(http_parser*parser, const char *at, size_t length)
{
    hdr.append(at, length);
}

static int headers_complete(http_parser*parser)
{
    do_anything_when_headers_received();
    send(fd, hdr.c_str(), hdr.size(), 0);
    hdr.clear();
}

static int body_raw(http_parser*parser, const char *at, size_t length)
{
    send(fd, at, length, 0);
}

But when developing complex processing, we need keep in mind, that on_header_raw() callback can be received before on_message_begin(). It's easy to reproduce - just send alone newline symbol before sending request. It happens because of non-caching nature of HTTP parser.

RandoMan70 avatar Feb 20 '16 07:02 RandoMan70

The numerous style errors alone make this pull request unsuitable for inclusion in its current shape. As to the functionality, I have no strong opinion on whether it's a good addition.

Something to keep in mind is that http_parser is more strict in what it accepts than most browsers. If you're going to use it for MitM-style analysis or filtering, you may have a hard time.

bnoordhuis avatar Feb 20 '16 12:02 bnoordhuis

@bnoordhuis, I don't insist, but these additions extend the limits of applicability of the parser. Writing transparent proxy with support of keep-alive and pipelining very difficult (if possible at all) without raw data callbacks. Using raw callbacks it becomes quite easy.

RandoMan70 avatar Feb 20 '16 16:02 RandoMan70

I'll let @indutny make the call on whether or not to accept this pull request.

Writing transparent proxy with support of keep-alive and pipelining very difficult (if possible at all) without raw data callbacks.

Is that really true? It's my experience that transparent proxies still frequently need to rewrite parts of the request and the response.

bnoordhuis avatar Feb 22 '16 10:02 bnoordhuis

@bnoordhuis, there are different tasks for proxying. Main goal of raw callbacks is to keep control of request/response sequence without need of reassembling stream before forwarding.

RandoMan70 avatar Feb 23 '16 10:02 RandoMan70

@RandoMan70 I don't really object to this change, but on other hand try to avoid touching that code. It is very sensitive code path, which may ruin the performance of the method because of high register pressure.

What do you think about doing this under some build-time define? Clearly this feature is not something that the most of the users will use, so it will make more sense to do it this way.

indutny avatar Feb 27 '16 03:02 indutny

Just store that here: https://github.com/OISF/libhtp/pull/123 exactly the same, but in libhtp!

socketpair avatar Mar 08 '16 18:03 socketpair

@RandoMan70 can you write me a message ? [email protected]

socketpair avatar Mar 08 '16 18:03 socketpair

@indutny it's possible, sure. But don't sure that it can significantly affect performance - checks executed mosly once per call and per request/response switch.

RandoMan70 avatar Mar 09 '16 15:03 RandoMan70

@bnoordhuis Any chance that this pull-request ever will be merged into master? This is exactly what i'm looking for as well.

HAT-DK avatar May 03 '18 12:05 HAT-DK

Not in v2.x since it changes the ABI. Maybe in 3.0 if and when that is released.

(It could be put behind a build-time flag but that's mighty inconvenient for some downstream users. Distros already have a hard enough time with the strict and non-strict builds.)

bnoordhuis avatar May 03 '18 13:05 bnoordhuis

Any update on adding this pull request, in a release?

HAT-DK avatar Jan 21 '19 08:01 HAT-DK

There are no concrete plans for a v3.x release so far.

bnoordhuis avatar Jan 21 '19 09:01 bnoordhuis

@HAT-DK It isn't mentioned in the readme, but nodejs is moving away from http-parser to llhttp. If you're interested, I could guide you through porting this PR to it.

indutny avatar Jan 21 '19 10:01 indutny

@asvetlov

Seems, aiohttp should move to this parser too :)

I have create an issue for that: https://github.com/aio-libs/aiohttp/issues/3561

socketpair avatar Jan 21 '19 11:01 socketpair