http icon indicating copy to clipboard operation
http copied to clipboard

HeaderValue::from_str does not validate that the string is valid ascii

Open tmpolaczyk opened this issue 2 years ago • 6 comments

The documentation of HeaderValue::from_str says that:

https://github.com/hyperium/http/blob/abe651273f4cf19cf9a247f376e9ece85becc722/src/header/value.rs#L101-L104

So I would expect this function to return an error when I pass it a non-ascii utf8 string which contains byte values in the 128-255 range. However the result is success.

Playground link

let x = http::header::HeaderValue::from_str("ñ");
panic!("{:?}", x);
thread 'main' panicked at 'Ok("\xc3\xb1")', src/main.rs:3:5

tmpolaczyk avatar Dec 27 '21 17:12 tmpolaczyk

Good catch, this does seem like a "bug".

seanmonstar avatar Dec 28 '21 00:12 seanmonstar

I disagree, the documentation should be changed instead. Header values are allowed to contain non-ASCII bytes, it's just control characters (with the exception of HTAB) that are disallowed.

Since every ASCII byte in a UTF-8 string is an ASCII char (multi-byte UTF-8 chars cannot contain bytes that look like ASCII), we should just relax this restriction and say that ASCII control characters with the exception of HTAB are disallowed and everything else is valid. This means the implementation stays the same (as that's what it's already doing).

lilyball avatar Feb 24 '22 23:02 lilyball

What's more, I would argue that from_static() should also allow non-ascii chars (and the documentation fixed, right now it says it allows 32-127 but it actually disallows 127 and allows 9), and to_str() should also stop limiting itself like this. I'm going to file a separate issue on the latter, but from_static() seems reasonable to cover here as "make it behave like from_str does".

lilyball avatar Feb 24 '22 23:02 lilyball

Filed as #527. So to summarize:

  • from_str is implemented correctly already and just needs a documentation update.
  • to_str should be changed as per #527 to be just std::str::from_utf8()
  • from_static should match from_str (it should test is_valid instead of is_visible_ascii).
  • Documentation should be vetted to ensure all descriptions of valid bytes are correct, i.e. that they allow 9 (\t) and disallow 127.

At this point the only use left of is_viisble_ascii is in the Debug impl. That's not a big issue as this is just debug, but it would be really nice if that could also detect valid UTF-8–encoded characters and allow those too. Ideally it would work like std::str::escape_debug() except including hex escapes for non-utf-8 bytes. But that's orthogonal to this issue.

lilyball avatar Feb 25 '22 00:02 lilyball

There is from_bytes() function that allow you use non-ascii bytes. And the name of from_str is misleading it should be try_from_str actually (but it's documented to be replaced in the future with TryFrom trait).

xoac avatar Feb 25 '22 00:02 xoac

@xoac There's no reason why HeaderValue::from_bytes(s.as_bytes()) should accept something that HeaderValue::from_str(s) does not (or rather, no reason why from_str() should reject something that from_bytes() allows). Also the documentation on from_bytes() once again forgets that \t is a valid byte.

Also I don't know why from_str() continues to say it will be replaced by TryFrom once the trait is stabilized, as the trait was stabilized back in 1.34 and HeaderValue already implements it. Heck, the TryFrom impl just delegates to FromStr (which has been stable since 1.0), so that just makes it even more confusing as to why HeaderValue::from_str() exists and claims it will be replaced. Also, these traits can't replace HeaderValue::from_static().

lilyball avatar Mar 06 '22 06:03 lilyball