[Feature Request] Ability to disable base64 for js and css content
When downloading a website I want do be able to read the produced css and js without further steps. So instead of
<script src="data:text/javascript;charset=utf-8;base64,CgkJCS8qIGRlY3J5cHQgaGVscGVyIGZ1bmN0aW9uICovCgkJZnVuY3Rpb24gZGVjcnlwdENoYXJjb2RlKG4sc3RhcnQsZW5kLG9mZnNldCkgewoJCQluID0gbiArIG9mZnNldDsKCQkJaWYgKG9mZnNldCA+IDAgJiYgbiA+IGVuZCkgewoJCQkJbiA9IHN0YXJ0ICsgKG4gLSBlbmQgLSAxKTsKCQkJfSBlbHNlIGlmIChvZmZzZXQgPCAwICYmIG4gPCBzdGFydCkgewoJCQkJbiA9IGVuZCAtIChzdGFydCAtIG4gLSAxKTsKCQkJfQoJCQlyZXR1cm4gU3RyaW5nLmZyb21DaGFyQ29kZShuKTsKCQl9CgkJCS8qIGRlY3J5cHQgc3RyaW5nICovCgkJZnVuY3Rpb24gZGVjcnlwdFN0cmluZyhlbmMsb2Zmc2V0KSB7CgkJCXZhciBkZWMgPSAiIjsKCQkJdmFyIGxlbiA9IGVuYy5sZW5ndGg7CgkJCWZvcih2YXIgaT0wOyBpIDwgbGVuOyBpKyspIHsKCQkJCXZhciBuID0gZW5jLmNoYXJDb2RlQXQoaSk7CgkJCQlpZiAobiA+PSAweDJCICYmIG4gPD0gMHgzQSkgewoJCQkJCWRlYyArPSBkZWNyeXB0Q2hhcmNvZGUobiwweDJCLDB4M0Esb2Zmc2V0KTsJLyogMC05IC4gLCAtICsgLyA6ICovCgkJCQl9IGVsc2UgaWYgKG4gPj0gMHg0MCAmJiBuIDw9IDB4NUEpIHsKCQkJCQlkZWMgKz0gZGVjcnlwdENoYXJjb2RlKG4sMHg0MCwweDVBLG9mZnNldCk7CS8qIEEtWiBAICovCgkJCQl9IGVsc2UgaWYgKG4gPj0gMHg2MSAmJiBuIDw9IDB4N0EpIHsKCQkJCQlkZWMgKz0gZGVjcnlwdENoYXJjb2RlKG4sMHg2MSwweDdBLG9mZnNldCk7CS8qIGEteiAqLwoJCQkJfSBlbHNlIHsKCQkJCQlkZWMgKz0gZW5jLmNoYXJBdChpKTsKCQkJCX0KCQkJfQoJCQlyZXR1cm4gZGVjOwoJCX0KCQkJLyogZGVjcnlwdCBzcGFtLXByb3RlY3RlZCBlbWFpbHMgKi8KCQlmdW5jdGlvbiBsaW5rVG9fVW5DcnlwdE1haWx0byhzKSB7CgkJCWxvY2F0aW9uLmhyZWYgPSBkZWNyeXB0U3RyaW5nKHMsLTIpOwoJCX0KCQk=" type="text/javascript"></script>
I want
<script>
/* decrypt helper function */
function decryptCharcode(n,start,end,offset) {
n = n + offset;
if (offset 0 && n end) {
n = start + (n - end - 1);
} else if (offset < 0 && n < start) {
n = end - (start - n - 1);
}
return String.fromCharCode(n);
}
/* decrypt string */
function decryptString(enc,offset) {
var dec = "";
var len = enc.length;
for(var i=0; i < len; i++) {
var n = enc.charCodeAt(i);
if (n = 0x2B && n <= 0x3A) {
dec += decryptCharcode(n,0x2B,0x3A,offset); /* 0-9 . , - + / : */
} else if (n >= 0x40 && n <= 0x5A) {
dec += decryptCharcode(n,0x40,0x5A,offset); /* A-Z @ */
} else if (n >= 0x61 && n <= 0x7A) {
dec += decryptCharcode(n,0x61,0x7A,offset); /* a-z */
} else {
dec += enc.charAt(i);
}
}
return dec;
}
/* decrypt spam-protected emails */
function linkTo_UnCryptMailto(s) {
location.href = decryptString(s,-2);
}
</script>
It could be two switches "--no-css-encode" and "--no-script-encode"
Hi Lucy! Thank you for this feature request.
Implementing something like this has crossed my mind early in the project, mainly to reduce the output file size.
The issue I encountered was the fact that today most linked scripts (and CSS) use the defer (async) attribute — they get loaded parallel to the linking document, and only executed upon getting retrieved in-full. Unwrapping them would break many pages, and these flags would only really work for (optimistically) 50% of linked styles and scripts. I could still do it, perhaps even by default for assets that aren't meant to be loaded asynchronously, or allow the user to enforce the unwrapping and ignore the asynchronous behavior.
Thanks for the response, I personally would be happy with those restrictions and other users might be too if they are properly documented. If you don't have time to work on this I can try to find my way through your code
So for js it was pretty easy:
if options.no_js_encode {
let data = if let Some(encoding) = Encoding::for_label(charset.as_bytes()) {
let (string, _, _) = encoding.decode(&data);
string.to_string()
} else {
String::from_utf8_lossy(&data).to_string()
};
let node_data = NodeData::Text { contents: RefCell::new(format_tendril!("{}", data)) };
node.children.borrow_mut().push(Node::new(node_data));
} else {
// Create and embed data URL
let mut data_url = create_data_url(&media_type, &charset, &data, &final_url);
data_url.set_fragment(resolved_url.fragment());
set_node_attr(node, attr_name, Some(data_url.to_string()));
}
However for css this is more complicated since it's using the element and for inline styles
Regarding the defer attribute, we could still keep both, since the src attribute should have priority over the containing text.
According to the html specs: https://html.spec.whatwg.org/multipage/scripting.html
- If el has a src content attribute, then: (1). If el's type is "importmap", then queue an element task on the DOM manipulation task source given el to fire an event named error at el, and return. (2). Let src be the value of el's src attribute. (3). If src is the empty string, then queue an element task on the DOM manipulation task source given el to fire an event named error at el, and return. (4). Set el's from an external file to true.
According to this the src element is used first, the direct execution is only available if all else fails (see in the otherwise section point 34.
I would be happy for any feedback you might have
I have verified this behavior using this simple setup:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Title</title>
<script defer src="index.js">
console.log("You would think!")
</script>
</head>
<body>
awdawd
</body>
</html>
// index.js
console.log("I have overriden you!")
Only "I have overriden you!" gets printed to console, this behaviour aligns with Edge (Chromium) and Firefox (Gecko). I have not tested in ladybird
From my memory, I believe that SCRIPT tags can either have content or "src", not both... it must be somewhere in the spec.
That is correct, the linked spec says so. But it does allow for scripts with a src attribute to contain comments as documentation. However since the browser takes the src over the inline text, which is only supposed to be a comment anyway it's safe to assume that it will work (as demonstrated by my simple showcase)
The next release will embed JS files as text inside <script> tags by default. I might add a flag to force using base64 for scripts, but don't think there'll be any need — using data URLs for scripts adds a giant overhead, both storage- and execution-wise.
The whole async/defer situation with data URLs is quite similar to loading the script instantly, and if any type of race condition happens, I'd say it's an edge case and not so much of monolith's fault.
Styles is a bit more tricky, <style> and <link> are quite different, but I think it's doable, even if not in 100% of cases.
JS is now embedded as plain text by default (v2.10.0 and up). I'll take care of LINK/STYLE next, can't promise when exactly, but it's high up on the list.
Didn't find this issue before writing https://github.com/Y2Z/monolith/pull/467 - which does something similar but different by creating data-URLs without base64 if the result is smaller (probably most of the time for plaintext files)
CSS is on its way to being included as <style>...</style> in the next couple of releases, but urlencode is still valid for data URLs that can't be embedded in any other way besides "href".