Escape any HTML special characters in GraphViz HTML
YAML and Python accept special characters (<, >, etc.) within strings without problems, but they cause issues when such a string is embedded inside the GraphViz HTML. Therefore, these characters should be escaped (<, >, etc.) when generating GraphViz HTML.
We need clear rules on how to handle such characters when generating the different output formats, as they have different limitations:
-
.bom.tsvdoes not support TAB, CR or LF in the text fields. No hyperlinks or formatting tags are supported as such. -
.gvdesignators have probably the same limitations as above, and also cannot contain characters that are interpreted as other syntax elements by Graphviz unless quoted. -
.gvHTML in labels support a limited set of formatting tags and no hyperlinks in the text (only as table attributes). TAB, CR and LF might improve file readability, but regarded equal as space when rendered.<br/>is needed to force a linebreak. See doc. -
.htmlsupport hyperlinks and a wider set of formatting tags. TAB, CR and LF might improve file readability, but regarded equal as space when rendered.<br/>is needed to force a linebreak.
I agree that we probably should escape HTML special characters and convert newline to <br/> (and perhaps
replace('\u00b2', '²')) for the two HTML output formats, but it should also be possible to disable all this when the user wants to include hyperlinks or some formatting tags, e.g. bold or italic to be used in the output formats that support them, and be filtered out in the other output formats (already partly implemented in #164). I guess we need a way to specify which of these two alternatives to apply for each input text attribute.
Is it possible to have a leading specifier flag in the attribute text to specify the non-default alternative? E.g. text attributes with a leading < character (or perhaps something like <!wireviz!> is better) to specify the second alternative.
See also https://github.com/formatc1702/WireViz/pull/168#issuecomment-776115110