ast icon indicating copy to clipboard operation
ast copied to clipboard

HTML/XML standards as they pertain to entities and printf %H

Open ghost opened this issue 4 years ago • 3 comments

https://github.com/att/ast/pull/1431

The apos entity isn't available in HTML: https://www.w3.org/TR/html4/sgml/entities.html#h-24.4.1 Notice how quot, lt, gt, and amp are there, but apos isn't.

Only five entities are predefined in XML, of which nbsp is not one of them: https://www.w3.org/TR/xml/#sec-predefined-ent

In a more pragmatic explanation, feed the attached file (after renaming it to have a .html rather than .txt extension, since github disallows .html files) html4apos.txt to: https://validator.w3.org/ Notice the error.

ghost avatar Nov 07 '19 11:11 ghost

Right now, ksh, in the function fmthtml, does this: https://github.com/att/ast/blob/master/src/cmd/ksh93/bltins/print.c#L415 It always produces both nbsp and apos entities, leading to code that could only possibly be valid in XHTML, but neither in plain XML nor in plain HTML.

It also converts a lot of characters (like the tab character) to numeric character references, which is unnecessary and often undesirable, especially for human readability of an HTML file.

ghost avatar Nov 07 '19 11:11 ghost

Even where nbsp is valid, the conversion of space characters to nbsp isn't an entirely sensical conversion. nbsp and regular spaces have different meanings in HTML.

ghost avatar Nov 07 '19 11:11 ghost

As in: https://github.com/att/ast/pull/1431 I think these lines should be deleted: https://github.com/att/ast/blob/ffbfaf6ff4fd0cb37505b78b02cb1af9eac04eb2/src/cmd/ksh93/bltins/print.c#L434-L439

ghost avatar Nov 09 '19 20:11 ghost