bug: Exception thrown for `script`, `style`, and `link` tags no-value `type` attribute
For the config:
{
removeScriptTypeAttributes: true,
removeStyleLinkTypeAttributes: true
}
The library throws an exception for the following inputs:
-
<script type />TypeError: Cannot read properties of undefined (reading 'split') -
<style type />TypeError: Cannot read properties of undefined (reading 'toLowerCase') -
<link type />TypeError: Cannot read properties of undefined (reading 'toLowerCase')
All three errors seem to originate from this if statement: https://github.com/terser/html-minifier-terser/blob/14b44c7bde1037ece4eae3e12a02493db71d660a/src/htmlminifier.js#L552-L559
Both isScriptTypeAttribute() and isStyleLinkTypeAttribute() don't seem to handle the case where attrValue is undefined.
If treating type the same as type="" is acceptable, then this results in a green CI build:
diff --git a/src/htmlminifier.js b/src/htmlminifier.js
index 8c2b1cb..fa9ee66 100644
--- a/src/htmlminifier.js
+++ b/src/htmlminifier.js
@@ -166,12 +166,12 @@ const keepScriptsMimetypes = new Set([
'module'
]);
-function isScriptTypeAttribute(attrValue) {
+function isScriptTypeAttribute(attrValue = "") {
attrValue = trimWhitespace(attrValue.split(/;/, 2)[0]).toLowerCase();
return attrValue === '' || executableScriptsMimetypes.has(attrValue);
}
-function keepScriptTypeAttribute(attrValue) {
+function keepScriptTypeAttribute(attrValue = "") {
attrValue = trimWhitespace(attrValue.split(/;/, 2)[0]).toLowerCase();
return keepScriptsMimetypes.has(attrValue);
}
@@ -189,7 +189,7 @@ function isExecutableScript(tag, attrs) {
return true;
}
-function isStyleLinkTypeAttribute(attrValue) {
+function isStyleLinkTypeAttribute(attrValue = "") {
attrValue = trimWhitespace(attrValue).toLowerCase();
return attrValue === '' || attrValue === 'text/css';
}
I'll be happy to open a PR if this is an acceptable fix.
Hi @KartikSoneji,
Thanks for reporting this.
Personally I have never seen type without any ="...".
Can you provide some examples and use cases, where something like this appears or is used?
https://www.w3.org/TR/REC-html40/interact/scripts.html#h-18.2.1 says:
" type = content-type [CI] This attribute specifies the scripting language of the element's contents and overrides the default scripting language. The scripting language is specified as a content type (e.g., "text/javascript"). Authors must supply a value for this attribute. There is no default value for this attribute. "
It seems WHATWG has a different specification:
https://html.spec.whatwg.org/multipage/scripting.html#the-script-element
The type attribute allows customization of the type of script represented:
-
Omitting the attribute, setting it to the empty string, or setting it to a JavaScript MIME type essence match, means that the script is a classic script, to be interpreted according to the JavaScript Script top-level production. Classic scripts are affected by the async and defer attributes, but only when the src attribute is set. Authors should omit the type attribute instead of redundantly setting it.
-
Setting the attribute to an ASCII case-insensitive match for the string "module" means that the script is a JavaScript module script, to be interpreted according to the JavaScript Module top-level production. Module scripts are not affected by the defer attribute, but are affected by the async attribute (regardless of the state of the src attribute).
-
Setting the attribute to any other value means that the script is a data block, which is not processed. None of the script attributes (except type itself) have any effect on data blocks. Authors must use a valid MIME type string that is not a JavaScript MIME type essence match to denote data blocks.
Omitting the attribute, setting it to the empty string, or setting it to a
This still does not cover your example (attribute but no value).
Afaik type without any value would be a boolean. Not an empty or normal string (at least this is how attributes like checked, required and disabled work).
Hi @DanielRuf
Personally I have never seen
typewithout any="...".
Me too, this is definitely not valid HTML, nor makes much sense.
Can you provide some examples and use cases, where something like this appears or is used?
I actually found this (bug?) while debugging a webpack build crash.
Someone had accidentally removed the ="image/png" part of the favicon meta tag.
While I agree this is a rather niche use-case, debugging this was not exactly fun. Also, I don't think the crashing on that type of input is the best course of action. At the very least there can be a better error message to help find where the exception is coming from.
Hi, is there any update on the issue?
If treating
typethe same astype=""is acceptable, then this results in a green CI build:
Not sure.
I'll be happy to open a PR if this is an acceptable fix.
A PR is very welcome. The current maintainers can then review the proposed changes.