JsEscaper duplicating json_encode() ?
Is JsEscaper duplicating the behavior of json_encode()?
Isn't this code essentially equivalent to, say, substr(json_encode($value), 1, -1) ?
Is there any benefit to re-implementing this?
Also, is it necessary to escape multi-byte UTF-8 characters in the first place? json_encode() lets you disable unnecessary unicode escapes with the JSON_UNESCAPED_UNICODE flag.
For non-UTF-8 (e.g. ASCII) output, unicode escaping is required - but the likely use-case is outputting UTF-8 strings, right? So why escape everything as ASCII?
@mindplay-dk that's an interesting question which I am not sure what is the best choice.
@harikt mainly I have two reservations about aggressively escaping all unicode characters:
- Data overhead: any non-ASCII characters come out several times more bytes than necessary.
- Human overhead: the resulting output is much harder for a person to read when debugging.
As far as I can figure, the only justification for a custom implementation, is the fact that json_encode() only supports UTF-8.
However, UTF-8 is also by far the most common use-case, and the escaper actually knows the target encoding, so maybe in this case json_encode() could be used as an optimization?
Here's the (UTF-8 only) function I ended up with:
/**
* Escape string for use in a JavaScript string literal context, e.g.:
*
* <script>
* document.title = 'Page: <%= html(js($title)) %>';
* </script>
*
* Note the use of both `html()` and `js()` here, since there are actually two nested
* contexts in this example: a JavaScript string in an HTML context.
*
* To illustrate why HTML context isn't simply assumed, here is an example using both
* `html()` and `js()`, since the following is JavaScript in an HTML-attribute context:
*
* <button onclick="alert('Hello, <?= attr(js($username)) ?>')">
*
* @param string|mixed $value UTF-8 string (or any string-castable value)
*
* @return string Javascript-string-escaped UTF-8 string
*/
function js($value)
{
$value = (string) $value;
if ($value === "") {
return $value;
}
$str = json_encode($value, JSON_UNESCAPED_UNICODE | JSON_UNESCAPED_SLASHES);
if (json_last_error() !== JSON_ERROR_NONE) {
throw new InvalidArgumentException(json_last_error_msg());
}
return preg_replace("#'#u", "\\'", substr($str, 1, -1));
}
This won't escape unicode characters or slashes - but will escape the single quote, which isn't escaped by json_encode() because it quotes strings with double quotes; but we want this function to work in the context of either a single or double-quoted JS string literal.
Here's my (Codeception Cest) spec, if that's helpful:
public function escapeJavascriptStrings(UnitTester $I)
{
$I->assertSame('\\\'', js('\''), "single quotes are escaped");
$I->assertSame('\\"', js('"'), "double quotes are escaped");
$I->assertSame('\\\\', js('\\'), "backslashes are escaped");
$I->assertSame('æøå', js('æøå'), "UTF-8 characters are preserved");
$I->assertSame("", js(null), "null treated as empty string");
$I->assertSame("123", js(123), "numeric values treated as strings");
}
@mindplay-dk your functions are quite impressive. But I am not sure if there is any edge case we are missing.
I was also looking at https://docs.zendframework.com/zend-escaper/escaping-javascript/ . May be there is some where a test suite against xss where we could try.
Thanks again for exploring your time against the same.
@mindplay-dk The note from @harikt regarding the Zend Escaper protections (from which the escaping in this library is derived) seems accurate to me; i.e., json_encode() by itself is insufficient.
@jakejohns If you don't mind I can close this issue, or you can.
json_encode()by itself is insufficient.
Yes. Maybe you didn't notice the examples in the doc-block?
I regard JS and HTML escaping as being different things - the js() function deliberately does only JavaScript string-escape, I have separate html() and attr() functions to further escape the string when used in HTML or HTML-attribute.
The reason I decided to go this way, is because I might have a view for an application/javascript response body, or I might be emitting content inside a CDATA block - in those case, I'd be calling js() only, and don't want it to do anything other than escaping the string for insertion into a JS context.
If I'm inserting into another context, I make two function-calls.
This takes a little getting used to, but I personally found that this leads to a lower total number of functions, combinations of which cover just about any use-case.
For example, it composes well when I'm building initialization code:
require(["foo"], function (foo) {
foo.init({
path: "/app-root/<?= js($path) ?>"
});
});
And then injecting that into an HTML page:
<script>
<?= html($init) ?>
</script>
I mean, if you care about HTML views only, yeah, you might as well go ahead and have that function escape for HTML right away - I personally like having functions that do one thing.
I have a total of four helper-functions for views and don't anticipate needing any more: js() for JS string-context, json() for non-string values (wrapper around json_encode() with sensible defaults), html() for HTML-tags and attr() for HTML-attributes.
I mix these as needed, which isn't "safe by default" in HTML contexts, but forces me to think about contexts wrapping contexts which I find reduces the risk of blindly trusting these functions for any given scenario.
I realize that's mostly a matter of taste and personal preference :-)
Maybe I'm missing your point buried in the wall-of-text. Can you state the change you'd like to see in a sentence or two, or in a PR?
Heh, no, I'm not asking you to change anything, I was just trying to explain my own approach.
The reason I even commented in the first place was simply because I was referencing your implementation, trying to learn if there was something I was missing, or why it is you need to custom-code stuff in PHP that's already available as functions.
I think we just have different goals and objectives, so you guys should probably just keep doing what you're doing :-)
trying to learn if there was something I was missing
Ah so! If you have links to share on your own implementations, I'm sure we'd be interested to seem them.
The project is closed-source (for the time being) but here's a copy/paste of the view-functions:
https://gist.github.com/mindplay-dk/2e3bdeec957d6f51197b4425c5192519
It's just a tiny set of four simple functions - the documentation is almost more important than the actual implementations, which are pretty trivial, so I copy/pasted the relevant section of the documentation as well.
Like I said, I am by no means suggesting you abandon your own approach, I was merely interested in sharing an alternative. Using these functions does require a little more thinking - I happen to think that's a good thing, but completely understand if that's not everyone's cup of tea :-)
Decided to package it real quick with tests and all. Here.