js-beautify icon indicating copy to clipboard operation
js-beautify copied to clipboard

HTML format breaks when text contains un-encoded <

Open RiFi2k opened this issue 5 years ago • 6 comments

Input

The code looked like this before beautification:

<!doctype html>
<html>

<body>
  <div>
    <button id="test">test < test</button>
  </div>
</body>

</html>

Expected Output

The code should have looked like this after beautification:

<!doctype html>
<html>

	<body>
		<div>
			<button id="test">test %3C test</button>
		</div>
	</body>

</html>

or

<!doctype html>
<html>

	<body>
		<div>
			<button id="test">test < test</button>
		</div>
	</body>

</html>

Actual Output

The code actually looked like this after beautification:

<!doctype html>
<html>

	<body>
		<div>
			<button id="test">test < test</button> </div> </body> </html>

This is using 1.9.0 on the online testing site.

I would assume you could take care of encoding the < while parsing. I personally think it would be fine to return it back encoded, but I'm not sure what your feelings on that part are, or if you even want to support it at all. I got an issue on it so I figured I would run it up the chain.

As a side note to save you a couple minutes I tried and having > in the text works as expected.

RiFi2k avatar Mar 22 '19 01:03 RiFi2k

@RiFi2k Beautifier doesn't change the text only the formatting. Hm, I guess the specific case of < followed by a space (or another < as reported in the original issue) could be detected and left unchanged. But if your input were test <test it would simply not be able to fix that. The beautifier doesn't have the smarts to do that.

bitwiseman avatar Mar 22 '19 02:03 bitwiseman

I didn't even think about the fact it would look at <test and beyond as the start of a new open tag. Makes sence for sure.

RiFi2k avatar Mar 22 '19 20:03 RiFi2k

By the way, W3 validator says this is not valid изображение

HanabishiRecca avatar Apr 14 '19 20:04 HanabishiRecca

For sure there is a good chance most people are just going to figure out to sub the < with an HTML entity once it screws up their formatting the first time.

RiFi2k avatar Apr 25 '19 16:04 RiFi2k

Does this seem related? Only using the format HTML part of the library.

<script>
    document.addEventListener("DOMContentLoaded", function(event) {
        GlobalVariables.taxonomy = <?= $taxonomy ?>;
    });
</script>

to

<script>
    document.addEventListener("DOMContentLoaded", function(event) {
         GlobalVariables.taxonomy = < ? = $taxonomy ? > ;
    });
</script>

Maybe possible to check for <? as well?

RiFi2k avatar Aug 27 '19 06:08 RiFi2k

hey @bitwiseman @RiFi2k , I was trying to fix the issue mentioned in the description and tried writing a failing test for the same here's the report:

---- expected-ws ------ <!doctype_html>\n <html>\n <body>\n <div>\n <button_id="test">test_<_test</button>\n </div>\n </body>\n </html>

---- output-ws ------ <!doctype_html>\n \n <html>\n <body>\n ____<div>\n ________<button_id="test">test_<_test</button>\n ____</div>\n </body>\n \n </html>

I'm not able to reproduce the exact failure, but instead getting some extra newline after doctype and extra spaces before divs and buttons. Not sure if I'm testing it correctly but please help me understand, thanks.

saharshg avatar Oct 01 '23 18:10 saharshg