URI.js icon indicating copy to clipboard operation
URI.js copied to clipboard

charset=Shift_JIS will fail to parse URI.js with a syntax error caused by non-ASCII characters in RegExp

Open codefactor opened this issue 3 years ago • 2 comments

Steps to Reproduce:

  1. Have an HTML payload where server gives response header content-type: text/html;charset=Shift_JIS
  2. Include the URI.js file with a script tag, use a compressed version of URI.js (not sure if same issue happens on the uncompressed one)

Unfortunately I can't find an easy way to give a link for this easily, but if it's necessary I could produce one maybe with codesandbox.

Expected:

The Javascript include should run, there should be no errors in the console

Actual:

The Javascript fails to parse with an error in the logs:

Uncaught SyntaxError: Unexpected token ':'

Root Cause:

There are non-ASCII characters inside of a Regular Expression in a couple places, example: https://github.com/medialize/URI.js/blob/b655c1b972111ade9f181b02374305942e68e30a/src/URI.js#L231

The non-ASCII characters ツォツサ窶懌?昶?倪??/code> are interpreted differently when the charset is set to Shift_JIS on the HTML page as a response header, and it causes the regular expression not to be closed properly, running into the next lines making a syntax error in the middle of the JSON. The same behavior is seen in Firefox and Chrome, I have not checked Edge.

Proposed solution:

Don't use non-ASCII characters which are unsafe when charsets are changed on the page, instead use a String that will be constructed with escaped characters:

  URI.find_uri_expression = new RegExp("\\b((?:[a-z][\\w-]+:(?:\\/{1,3}|[a-z0-9%])|www\\d{0,3}[.]|[a-z0-9.\\-]+[.][a-z]{2,4}\\/)(?:[^\\s()<>]+|\\(([^\\s()<>]+|(\\([^\\s()<>]+\\)))*\\))+(?:\\(([^\\s()<>]+|(\\([^\\s()<>]+\\)))*\\)|[^\\s`!()\\[\\]{};:'\".,<>?\xab\xbb\u201c\u201d\u2018\u2019]))", "ig");

One other place: https://github.com/medialize/URI.js/blob/b655c1b972111ade9f181b02374305942e68e30a/src/URI.js#L238

Could update to this:

    trim: new RegExp("[`!()\\[\\]{};:'\".,<>?\xab\xbb\u201c\u201d\u201E\u2018\u2019]+$"),

These are 2 places, there might be more.

codefactor avatar Jun 07 '22 21:06 codefactor

To facilitate - here is my attempt at a PR: https://github.com/medialize/URI.js/pull/416

codefactor avatar Jun 07 '22 22:06 codefactor

As an update - from our side, we have other resources that might require UTF-8 charset, so we will fix this issue from our side by consistently using UTF-8 charset.

However, it still might be a good idea to have the Javascript file to contain ASCII characters only so that this syntax error wouldn't come up if for whatever reason the page gets switched to Japanese charset.

However, this support does objectively increase the size of the file by a few bytes - so it's a little bit of a trade off.

codefactor avatar Jun 08 '22 16:06 codefactor