orb icon indicating copy to clipboard operation
orb copied to clipboard

How to decode potential JavaScript

Open annevk opened this issue 3 years ago • 6 comments

We might not always have an encoding, e.g., fetch(..., { mode: "no-cors" }). Is it reasonable to always use UTF-8 for this check?

annevk avatar Jan 11 '21 14:01 annevk

Looking at this again and in particular https://html.spec.whatwg.org/#fetch-a-classic-script I think the simplest option here is that we pass the encoding along with the request and then we need to abstract or duplicate these steps (and maybe improve them while we're at it, especially getting the charset parameter from the Content-Type header):

  1. If response's Content Type metadata, if any, specifies a character encoding, and the user agent supports that encoding, then set character encoding to that encoding (ignoring the passed-in value).
  2. Let source text be the result of decoding response's body to Unicode, using character encoding as the fallback encoding.
  3. Let script be the result of creating a classic script given source text, settings object, response's url, options, and muted errors.

And then if script's record is null parsing failed.

@domenic does that seem right to you?

annevk avatar Jan 22 '21 11:01 annevk

I don't have the full context on what security guarantees we're trying to preserve here (is it bad to leak information about the Content-Type header?) but in terms of a spec refactoring, that seems reasonable.

domenic avatar Jan 22 '21 17:01 domenic

and maybe improve them while we're at it, especially getting the charset parameter from the Content-Type header

Basically every usage of "Content-Type metadata" in HTML could be improved by using the new MIME type getter, I think.

domenic avatar Jan 22 '21 17:01 domenic

One risk here is that the attacker has control over the encoding, so this technically gives them more opportunity to find a way to get something parsed as JavaScript. In practice it still seems hard to parse as JavaScript as the majority of significant bytes are in the ASCII range.

annevk avatar Oct 04 '21 08:10 annevk

I included a fix for this in https://github.com/whatwg/fetch/pull/1442 which I think works. The HTML side will need to set it on requests, but that's a very straightforward change.

And while it is unfortunate that the fallback encoding is in the hands of the attacker, this is no different from the status quo.

annevk avatar May 17 '22 13:05 annevk

I forgot that the response itself also carries encoding-related information. https://github.com/whatwg/fetch/pull/1447 tackles the first part of that. Once that lands it should be easy to call from Fetch's ORB PR.

annevk avatar Jun 01 '22 13:06 annevk