beast Relax http response parsing?

I'm currently trying to replace a libsoup based http client implementation with boost beast. However, beast's http response parser is (deliberately) very strict when it comes to RFC compliance. For example, (in contrast to libsoup) it only accepts:

status-line = HTTP-version SP status-code SP reason-phrase CRLF

Unfortunately, various "real world" servers don't strictly play by the rules and e.g. return status lines that are only terminated by a LF. While it would theoretically be the best to fix these servers, in practice this is often not really possible.

I realize that this (or a similar) topic has already been discussed in the past:

https://github.com/boostorg/beast/issues/1761
https://github.com/boostorg/beast/issues/1138
https://github.com/boostorg/beast/issues/2344

And of course, one could always just patch the parser code as needed. Nevertheless, I'd like to know:

Are there any plans to provide a more relaxed way of parsing? E.g. maybe via some opt-in basic_parser::relaxed() function? Or would something like this be strictly against the spirit of the library?
If adjusting parsing is not an option, what would be the best/easiest way to preprocess/filter incoming data first? To me, the approach outlined in the icy_stream.hpp example seems overly complicated but so far I haven't found a simpler/more obvious approach.

Apr 26 '23 18:04 BenKaufmann

No, and the icy_stream example is the way.

Apr 26 '23 23:04 vinniefalco

I see. That's unfortunate and makes switching to beast more complex.

Out of curiosity: is there a particular (technical) reason for not following the suggestions from RFC-2616/section-19.3:

Clients SHOULD be tolerant in parsing the Status-Line and servers tolerant when parsing the Request-Line. In particular, they SHOULD accept any amount of SP or HT characters between fields, even though only a single SP is required.

The line terminator for message-header fields is the sequence CRLF. However, we recommend that applications, when parsing such headers, recognize a single LF as a line terminator and ignore the leading CR.

Apr 27 '23 11:04 BenKaufmann

Out of curiosity: is there a particular (technical) reason for not following the suggestions from RFC-2616/section-19.3:

Yes. That guidance was changed in rfc7230 which obsoletes rfc2616. Implementations still "may" be tolerant but it is noted that the practice exposes programs to vulnerabilities (see https://www.ietf.org/rfc/rfc7230.html#section-9.5).

Also note that rfc9110 and rfc9112 obsolete rfc7230, but these have not been exhaustively applied to Beast. I'm willing to consider the possibility of allowing more tolerant parsing if the newer RFCs say something on the matter.

Apr 27 '23 13:04 vinniefalco

Interesting, I didn't know that :+1: I don't quite understand how rfc7230/section-9.5 (and 3.3.3) invalidates the more lenient approach to start-line and header field parsing mentioned in both rfc7230 (section-3.5) and rfc9112 (section-2.2), though.

Anyhow, thanks for your quick response. I will then stick to patching the parser as I failed to get the icy_stream approach working with a boost::beast::ssl_stream in a reasonable amount of time.

Apr 27 '23 14:04 BenKaufmann

@klemens-morgenstern if you want to have a go at putting together a branch that has the more lenient parsing, I'd be open to it. We don't have any parse_options yet (a-la URL or JSON) but we could in theory add it.

Apr 27 '23 14:04 vinniefalco

Are you sure that's a good idea? That sounds to me like something that's more an area that http-proto would be customized on.

May 10 '23 23:05 klemens-morgenstern

The reported issue correctly notes that clients may use a specific form of relaxed parsing. This basically means we have to implement it if we want full compatibility.

http-proto will have to do this too, please open a copy of this issue there.

May 12 '23 06:05 vinniefalco

beast beast copied to clipboard

Relax http response parsing?

beast
beast copied to clipboard