ast
ast copied to clipboard
HTML/XML standards as they pertain to entities and printf %H
https://github.com/att/ast/pull/1431
The apos entity isn't available in HTML: https://www.w3.org/TR/html4/sgml/entities.html#h-24.4.1 Notice how quot, lt, gt, and amp are there, but apos isn't.
Only five entities are predefined in XML, of which nbsp is not one of them: https://www.w3.org/TR/xml/#sec-predefined-ent
In a more pragmatic explanation, feed the attached file (after renaming it to have a .html rather than .txt extension, since github disallows .html files) html4apos.txt to: https://validator.w3.org/ Notice the error.
Right now, ksh, in the function fmthtml, does this: https://github.com/att/ast/blob/master/src/cmd/ksh93/bltins/print.c#L415 It always produces both nbsp and apos entities, leading to code that could only possibly be valid in XHTML, but neither in plain XML nor in plain HTML.
It also converts a lot of characters (like the tab character) to numeric character references, which is unnecessary and often undesirable, especially for human readability of an HTML file.
Even where nbsp is valid, the conversion of space characters to nbsp isn't an entirely sensical conversion. nbsp and regular spaces have different meanings in HTML.
As in: https://github.com/att/ast/pull/1431 I think these lines should be deleted: https://github.com/att/ast/blob/ffbfaf6ff4fd0cb37505b78b02cb1af9eac04eb2/src/cmd/ksh93/bltins/print.c#L434-L439