minify-html icon indicating copy to clipboard operation
minify-html copied to clipboard

whitespaces between tags in `<div>` should be collapsed instead of being cleaned

Open Rongronggg9 opened this issue 2 years ago • 4 comments

whitespaces between tags in <div> should be collapsed instead of being cleaned

input:

<div>
	<span>blablabla</span>
	<div>
		<span>blablabla</span>
        <b>Bold</b> <i>Italic</i> <u>Underlined</u>
		<a href="https://example.com">URL1</a>
		<a href="https://example.com">URL2</a>
		<a href="https://example.com">URL3</a>
		<a href="https://example.com">URL4</a>
	</div>
</div>

Firefox:

Oops, minify_html messed it up:

<div><span>blablabla</span><div><span>blablabla</span><b>Bold</b><i>Italic</i><u>Underlined</u><a href=https://example.com>URL1</a><a href=https://example.com>URL2</a><a href=https://example.com>URL3</a><a href=https://example.com>URL4</a></div></div>

So the intended output should be:

<div><span>blablabla</span> <div><span>blablabla</span> <b>Bold</b> <i>Italic</i> <u>Underlined</u> <a href=https://example.com>URL1</a> <a href=https://example.com>URL2</a> <a href=https://example.com>URL3</a> <a href=https://example.com>URL4</a></div></div>

Note that the space between the first <span> and the inner <div> is needed, because a <div> can be floated or inline if the CSS set it to do so. That's also true for <p> (Unfortunately if I replace the outer <div> with <p>, this space will be still absent, so it is another bug).

FYI: Python 3.9.10 (main, Feb 22 2022, 13:54:07) [GCC 11.2.0] on linux minify-html 0.8.0

Rongronggg9 avatar Mar 19 '22 12:03 Rongronggg9

As per the README, minify-html considers <div> a layout element and can only have layout or content elements (which elements belong in which categories are mentioned in the README). In this case, you have used formatting elements <span>, <b>, <i>, <u>, and <a> inside <div>, which breaks this assumption. I would suggest replacing all <div> usages with <p>, as that should be more semantically correct in this case.

It's true that using CSS can change a <div> to act like a content element and/or make it inline. However, this is true of any element. Therefore minify-html has to make assumptions (which are based on semantic best practices); otherwise it's impossible for minify-html to do anything. After all, any element could be like <pre> with the CSS: * { white-space: pre; }, in which case removing any whitespace is wrong.

wilsonzlin avatar Mar 22 '22 01:03 wilsonzlin

I would suggest replacing all <div> usages with <p>, as that should be more semantically correct in this case.

No, we can't do that, it's invalid to have nested <p>s, but nested <div>s are valid though. And what's more, that HTML is not my work, I just use this lib to deal with tons of HTML from RSS feeds. (Maybe it is not an intended usage? haha) Many of them have <div>s like this. Having a huge <div> block consisting of formatting elements unwrapped in layout or content elements is not uncommon as far as I met, though it is a bit strange as you have mentioned.

minify-html considers <div> a layout element and can only have layout or content elements

However, as MDN documented, the permitted content of <div> is Flow content. So having formatting elements unwrapped in layout or content elements in <div> is totally valid and should be expected to a certain extent... I can understand your point, restricting expected content to be layout or content elements creates much ease and optimized output, and most websites do obey this rule. If changing this rule does introduce much complexity, what about having an option to deal <div>s just like <p>s mostly (since you suggest me replacing all <div> usages with <p>)? Anyway, that's OK to make no change and I agree that in most cases and for most users this is not really a problem.

Thanks for your answer and your fancy work. This lib is the best one in similar libs I've met.

Rongronggg9 avatar Mar 22 '22 04:03 Rongronggg9

Good point about the nested <p>, I forgot about that rule, which is strictly enforced by browsers. Normally I'd suggest changing the structure but it sounds like you are using minify-html for other content (i.e. out of your control). I'm open to the possibility of having a special configuration that changes the meaning of <div> so that it could contain formatting elements, as it is a very widely used "generic" element, and many will probably use it in the case you have mentioned.

wilsonzlin avatar Mar 22 '22 05:03 wilsonzlin

Thanks a lot for your minifier!

I came here for a similar situation.

Your differentiation between content, formatting and layout tags does make a lot of sense and promotes the use of sematic HTML.

Would’t it make sense to have an option like

do_not_collapse_whitespace_for_consecutive_formatting_tags: true

This way

<div> <span>test</span> </div>

would be normally collapsed, but

<div> <span>test 1<span> <span>test 2</span> </div>

would be minified to:

<div><span>test 1<span> <span>test 2</span></div>

, and @Rongronggg9’s code to

<div><span>blablabla</span><div><span>blablabla</span> <b>Bold</b> <i>Italic</i> <u>Underlined</u> <a href=https://example.com>URL1</a> <a href=https://example.com>URL2</a> <a href=https://example.com>URL3</a> <a href=https://example.com>URL4</a></div></div>

Brixy avatar Mar 10 '23 23:03 Brixy