html-minifier Feature Request: HTML Escaping

Use Case: I've got ~50 code examples in my site, in 6 languages. Those languages native syntax obviously causes Parse Errors with html-minifier because they aren't escaped. I've been escaping them, but it has become unmanageable and really tough to run/update these code samples. While there are solutions that I've been looking in to, I think a solution could be built into html-minifier:

options: {
    escapeFragments: [ /<pre><code>[\S\s]*<\/code><\/pre>/ ]
}

alternatively (though less preferable):

<code><!-- htmlmin:escape --><PARAM_HERE><!-- htmlmin:escape --></code>

I tried solving this using ignoreCustomFragments, which made the build pass, but browsers render this incorrectly when you use code that resembles html open/close tags such as Map<String, Object> batchMap = new HashMap<String, Object>();

It seems like there's been enough issues posted about html entities that I'm not the only one that would benefit from a built in solution: #446 #195 #282

Mar 11 '16 01:03 k-funk

do the original HTML renders correctly in a web browser?
can you provide a concrete example input which demonstrate this issue?
would you mind explaining the difference between your proposal and ?

Mar 11 '16 06:03 alexlamsl

Original HTML doesn't render correctly in a web browser
https://jsfiddle.net/p2t022rg/
ignoreCustomFragments/htmlmin:ignore allows html-minifier bypass the ParseErrors that would occur, but doesn't do any escaping

Mar 11 '16 07:03 k-funk

Hmm... if the original HTML isn't recognised by any web browsers in the first place, why would we expect html-minifier to handle it? The only exceptions I can think of are those JSP/PHP stuff, before they are processed server-side.

Btw, I did a quick lookup on Google and found this.

Mar 11 '16 07:03 alexlamsl

The reason that this new feature makes sense inside of html-minifier is because

I already have to use the ignoreCustomFragments option on any (python, php, java, c-sharp, etc) code samples on my site, based on a regex matching, otherwise Parse Errors will occur.
Using another module to parse every file again and match that same regex seems really inefficient.
It makes sense to me for html-minifier to, while parsing, find code that needs to be ignored, and instead of looking over it, encode everything that was matched. It seems like quite a few users have had this problem, and would benefit more from encoding the content that causes Parse Errors instead of ignoring it and making what could be a one-module solution a two+ module solution.

Mar 11 '16 19:03 k-funk

So looking through the other issues you've mentioned, I wonder if what you meant is more like #591?

If so, would making our HTML parser more relaxed about non-escaped characters (which aims to match the web browser behaviour) work for you instead?

Mar 31 '16 05:03 alexlamsl

I don't think it would work, because I need < to be escaped when I have code samples like new HashMap<String, Object>. If they aren't escaped, the browser thinks they are an html tag.

Apr 04 '16 17:04 k-funk

What about instead of supporting escaping (and then the next feature and next feature...) just support a custom function / callback?

The default function could just be ignore and then people that want to do something else can without implementing a complete tool.

Oct 23 '16 06:10 danielbodart

Oops just realised #382 is probably the answer

Oct 23 '16 06:10 danielbodart

Hello, sorry to interrupt your conversation but I am also receiving HTML Parse error. I am trying to parse the <% if(inlineEdit){ %> that is EJS file syntax, in short, I am trying to minify EJS file using HTML-MINIFIER but not able to do it. Can you please shower yours views on same.

Nov 10 '17 13:11 Akshaykalola

html-minifier html-minifier copied to clipboard

Feature Request: HTML Escaping

html-minifier
html-minifier copied to clipboard