Aura.Html icon indicating copy to clipboard operation
Aura.Html copied to clipboard

JsEscaper duplicating json_encode() ?

Open mindplay-dk opened this issue 9 years ago • 10 comments

Is JsEscaper duplicating the behavior of json_encode()?

Isn't this code essentially equivalent to, say, substr(json_encode($value), 1, -1) ?

Is there any benefit to re-implementing this?

Also, is it necessary to escape multi-byte UTF-8 characters in the first place? json_encode() lets you disable unnecessary unicode escapes with the JSON_UNESCAPED_UNICODE flag.

For non-UTF-8 (e.g. ASCII) output, unicode escaping is required - but the likely use-case is outputting UTF-8 strings, right? So why escape everything as ASCII?

mindplay-dk avatar Aug 08 '16 13:08 mindplay-dk

@mindplay-dk that's an interesting question which I am not sure what is the best choice.

harikt avatar Aug 08 '16 15:08 harikt

@harikt mainly I have two reservations about aggressively escaping all unicode characters:

  1. Data overhead: any non-ASCII characters come out several times more bytes than necessary.
  2. Human overhead: the resulting output is much harder for a person to read when debugging.

As far as I can figure, the only justification for a custom implementation, is the fact that json_encode() only supports UTF-8.

However, UTF-8 is also by far the most common use-case, and the escaper actually knows the target encoding, so maybe in this case json_encode() could be used as an optimization?

Here's the (UTF-8 only) function I ended up with:

/**
 * Escape string for use in a JavaScript string literal context, e.g.:
 *
 *     <script>
 *         document.title = 'Page: <%= html(js($title)) %>';
 *     </script>
 *
 * Note the use of both `html()` and `js()` here, since there are actually two nested
 * contexts in this example: a JavaScript string in an HTML context.
 *
 * To illustrate why HTML context isn't simply assumed, here is an example using both
 * `html()` and `js()`, since the following is JavaScript in an HTML-attribute context:
 *
 *     <button onclick="alert('Hello, <?= attr(js($username)) ?>')">
 *
 * @param string|mixed $value UTF-8 string (or any string-castable value)
 *
 * @return string Javascript-string-escaped UTF-8 string
 */
function js($value)
{
    $value = (string) $value;

    if ($value === "") {
        return $value;
    }

    $str = json_encode($value, JSON_UNESCAPED_UNICODE | JSON_UNESCAPED_SLASHES);

    if (json_last_error() !== JSON_ERROR_NONE) {
        throw new InvalidArgumentException(json_last_error_msg());
    }

    return preg_replace("#'#u", "\\'", substr($str, 1, -1));
}

This won't escape unicode characters or slashes - but will escape the single quote, which isn't escaped by json_encode() because it quotes strings with double quotes; but we want this function to work in the context of either a single or double-quoted JS string literal.

Here's my (Codeception Cest) spec, if that's helpful:

    public function escapeJavascriptStrings(UnitTester $I)
    {
        $I->assertSame('\\\'', js('\''), "single quotes are escaped");

        $I->assertSame('\\"', js('"'), "double quotes are escaped");

        $I->assertSame('\\\\', js('\\'), "backslashes are escaped");

        $I->assertSame('æøå', js('æøå'), "UTF-8 characters are preserved");

        $I->assertSame("", js(null), "null treated as empty string");

        $I->assertSame("123", js(123), "numeric values treated as strings");
    }

mindplay-dk avatar Aug 08 '16 15:08 mindplay-dk

@mindplay-dk your functions are quite impressive. But I am not sure if there is any edge case we are missing.

I was also looking at https://docs.zendframework.com/zend-escaper/escaping-javascript/ . May be there is some where a test suite against xss where we could try.

Thanks again for exploring your time against the same.

harikt avatar Sep 29 '16 07:09 harikt

@mindplay-dk The note from @harikt regarding the Zend Escaper protections (from which the escaping in this library is derived) seems accurate to me; i.e., json_encode() by itself is insufficient.

@jakejohns If you don't mind I can close this issue, or you can.

pmjones avatar Feb 09 '17 16:02 pmjones

json_encode() by itself is insufficient.

Yes. Maybe you didn't notice the examples in the doc-block?

I regard JS and HTML escaping as being different things - the js() function deliberately does only JavaScript string-escape, I have separate html() and attr() functions to further escape the string when used in HTML or HTML-attribute.

The reason I decided to go this way, is because I might have a view for an application/javascript response body, or I might be emitting content inside a CDATA block - in those case, I'd be calling js() only, and don't want it to do anything other than escaping the string for insertion into a JS context.

If I'm inserting into another context, I make two function-calls.

This takes a little getting used to, but I personally found that this leads to a lower total number of functions, combinations of which cover just about any use-case.

For example, it composes well when I'm building initialization code:

require(["foo"], function (foo) {
    foo.init({
        path: "/app-root/<?= js($path) ?>"
    });
});

And then injecting that into an HTML page:

<script>
    <?= html($init) ?>
</script>

I mean, if you care about HTML views only, yeah, you might as well go ahead and have that function escape for HTML right away - I personally like having functions that do one thing.

I have a total of four helper-functions for views and don't anticipate needing any more: js() for JS string-context, json() for non-string values (wrapper around json_encode() with sensible defaults), html() for HTML-tags and attr() for HTML-attributes.

I mix these as needed, which isn't "safe by default" in HTML contexts, but forces me to think about contexts wrapping contexts which I find reduces the risk of blindly trusting these functions for any given scenario.

I realize that's mostly a matter of taste and personal preference :-)

mindplay-dk avatar Feb 09 '17 17:02 mindplay-dk

Maybe I'm missing your point buried in the wall-of-text. Can you state the change you'd like to see in a sentence or two, or in a PR?

pmjones avatar Feb 09 '17 17:02 pmjones

Heh, no, I'm not asking you to change anything, I was just trying to explain my own approach.

The reason I even commented in the first place was simply because I was referencing your implementation, trying to learn if there was something I was missing, or why it is you need to custom-code stuff in PHP that's already available as functions.

I think we just have different goals and objectives, so you guys should probably just keep doing what you're doing :-)

mindplay-dk avatar Feb 09 '17 20:02 mindplay-dk

trying to learn if there was something I was missing

Ah so! If you have links to share on your own implementations, I'm sure we'd be interested to seem them.

pmjones avatar Feb 09 '17 20:02 pmjones

The project is closed-source (for the time being) but here's a copy/paste of the view-functions:

https://gist.github.com/mindplay-dk/2e3bdeec957d6f51197b4425c5192519

It's just a tiny set of four simple functions - the documentation is almost more important than the actual implementations, which are pretty trivial, so I copy/pasted the relevant section of the documentation as well.

Like I said, I am by no means suggesting you abandon your own approach, I was merely interested in sharing an alternative. Using these functions does require a little more thinking - I happen to think that's a good thing, but completely understand if that's not everyone's cup of tea :-)

mindplay-dk avatar Feb 11 '17 14:02 mindplay-dk

Decided to package it real quick with tests and all. Here.

mindplay-dk avatar Feb 11 '17 16:02 mindplay-dk