html-minifier-terser icon indicating copy to clipboard operation
html-minifier-terser copied to clipboard

bug: Exception thrown for `script`, `style`, and `link` tags no-value `type` attribute

Open KartikSoneji opened this issue 3 years ago • 8 comments

For the config:

{
  removeScriptTypeAttributes: true,
  removeStyleLinkTypeAttributes: true
}

The library throws an exception for the following inputs:

  1. <script type /> TypeError: Cannot read properties of undefined (reading 'split')
  2. <style type /> TypeError: Cannot read properties of undefined (reading 'toLowerCase')
  3. <link type /> TypeError: Cannot read properties of undefined (reading 'toLowerCase')

All three errors seem to originate from this if statement: https://github.com/terser/html-minifier-terser/blob/14b44c7bde1037ece4eae3e12a02493db71d660a/src/htmlminifier.js#L552-L559

Both isScriptTypeAttribute() and isStyleLinkTypeAttribute() don't seem to handle the case where attrValue is undefined.

KartikSoneji avatar Aug 18 '22 20:08 KartikSoneji

If treating type the same as type="" is acceptable, then this results in a green CI build:

diff --git a/src/htmlminifier.js b/src/htmlminifier.js
index 8c2b1cb..fa9ee66 100644
--- a/src/htmlminifier.js
+++ b/src/htmlminifier.js
@@ -166,12 +166,12 @@ const keepScriptsMimetypes = new Set([
   'module'
 ]);
 
-function isScriptTypeAttribute(attrValue) {
+function isScriptTypeAttribute(attrValue = "") {
   attrValue = trimWhitespace(attrValue.split(/;/, 2)[0]).toLowerCase();
   return attrValue === '' || executableScriptsMimetypes.has(attrValue);
 }
 
-function keepScriptTypeAttribute(attrValue) {
+function keepScriptTypeAttribute(attrValue = "") {
   attrValue = trimWhitespace(attrValue.split(/;/, 2)[0]).toLowerCase();
   return keepScriptsMimetypes.has(attrValue);
 }
@@ -189,7 +189,7 @@ function isExecutableScript(tag, attrs) {
   return true;
 }
 
-function isStyleLinkTypeAttribute(attrValue) {
+function isStyleLinkTypeAttribute(attrValue = "") {
   attrValue = trimWhitespace(attrValue).toLowerCase();
   return attrValue === '' || attrValue === 'text/css';
 }

I'll be happy to open a PR if this is an acceptable fix.

KartikSoneji avatar Aug 18 '22 20:08 KartikSoneji

Hi @KartikSoneji,

Thanks for reporting this.

Personally I have never seen type without any ="...".

Can you provide some examples and use cases, where something like this appears or is used?

DanielRuf avatar Aug 18 '22 21:08 DanielRuf

https://www.w3.org/TR/REC-html40/interact/scripts.html#h-18.2.1 says:

" type = content-type [CI] This attribute specifies the scripting language of the element's contents and overrides the default scripting language. The scripting language is specified as a content type (e.g., "text/javascript"). Authors must supply a value for this attribute. There is no default value for this attribute. "

DanielRuf avatar Aug 18 '22 21:08 DanielRuf

It seems WHATWG has a different specification:

https://html.spec.whatwg.org/multipage/scripting.html#the-script-element

The type attribute allows customization of the type of script represented:

  • Omitting the attribute, setting it to the empty string, or setting it to a JavaScript MIME type essence match, means that the script is a classic script, to be interpreted according to the JavaScript Script top-level production. Classic scripts are affected by the async and defer attributes, but only when the src attribute is set. Authors should omit the type attribute instead of redundantly setting it.

  • Setting the attribute to an ASCII case-insensitive match for the string "module" means that the script is a JavaScript module script, to be interpreted according to the JavaScript Module top-level production. Module scripts are not affected by the defer attribute, but are affected by the async attribute (regardless of the state of the src attribute).

  • Setting the attribute to any other value means that the script is a data block, which is not processed. None of the script attributes (except type itself) have any effect on data blocks. Authors must use a valid MIME type string that is not a JavaScript MIME type essence match to denote data blocks.

Omitting the attribute, setting it to the empty string, or setting it to a

This still does not cover your example (attribute but no value).

DanielRuf avatar Aug 18 '22 21:08 DanielRuf

Afaik type without any value would be a boolean. Not an empty or normal string (at least this is how attributes like checked, required and disabled work).

DanielRuf avatar Aug 18 '22 21:08 DanielRuf

Hi @DanielRuf

Personally I have never seen type without any ="...".

Me too, this is definitely not valid HTML, nor makes much sense.

Can you provide some examples and use cases, where something like this appears or is used?

I actually found this (bug?) while debugging a webpack build crash. Someone had accidentally removed the ="image/png" part of the favicon meta tag.

While I agree this is a rather niche use-case, debugging this was not exactly fun. Also, I don't think the crashing on that type of input is the best course of action. At the very least there can be a better error message to help find where the exception is coming from.

KartikSoneji avatar Aug 18 '22 21:08 KartikSoneji

Hi, is there any update on the issue?

KartikSoneji avatar Sep 03 '22 19:09 KartikSoneji

If treating type the same as type="" is acceptable, then this results in a green CI build:

Not sure.

I'll be happy to open a PR if this is an acceptable fix.

A PR is very welcome. The current maintainers can then review the proposed changes.

DanielRuf avatar Sep 03 '22 19:09 DanielRuf