tsdoc icon indicating copy to clipboard operation
tsdoc copied to clipboard

RFC: Replace `DocHtmlStartTag` and `DocHtmlEndTag` with `DocXmlElement`

Open xiaoxiangmoe opened this issue 5 years ago • 7 comments

Markdown table or HTML table element can't be rendered correctly in TSDoc playground

xiaoxiangmoe avatar Jan 04 '19 06:01 xiaoxiangmoe

I wanted to enable general HTML tags, however for security reasons it requires a script sanitizer such as sanitize-html or xss. @iclanton could we add that to the loader?

octogonz avatar Jan 07 '19 23:01 octogonz

dompurify looks to be more self-contained, and is hosted by cdnjs.com.

octogonz avatar Jan 08 '19 00:01 octogonz

@iclanton started a PR https://github.com/Microsoft/tsdoc/pull/139 to implement this.

However, when the playground tries to render DocHtmlStartTag and DocHtmlEndTag as React virtual DOM, there isn't any way to render half of a tag. There's a proposal for React.Fragment to support dangerouslySetInnerHTML , but it's not implemented yet. (API Extractor never encountered this problem because it renders TSDoc as Markdown instead of React, so we can simply append arbitrary text to the markdown string ignoring the tree structure.)

We considered trying to match up DocHtmlStartTag and DocHtmlEndTag during rendering, so we could create a corresponding React element, and then reattach the appropriate child nodes. But that seems messy. We realized that the TSDoc parser itself can do this fairly easily, if we require the input HTML to be well-formed XML instead of the messy ad hoc structures that CommonMark allows. This appeals to me, because when people need to embed nesting metadata in TSDoc, I generally recommend HTML5 custom elements instead of custom TSDoc inline tags, since XML is by design extensible and interoperable, whereas inline tags use a highly proprietary syntax.

Thus, I'm proposing to replace DocHtmlStartTag and DocHtmlEndTag in the TSDoc API with a new class called DocXmlElement. This will make HTML elements "first class" citizens in the TSDoc spec, and make them easier for documentation tools to work with.

octogonz avatar Jan 08 '19 02:01 octogonz

Hi @octogonz, I would be interested in implementing a DocXmlElement. However, I have few specific questions:

Would we still want to store the opening and closing delimiters for both the start and end tags within DocXmlElement or just store the start tag and end tag excerpts?

~~Would a DocXmlElement extend a DocNodeContainer? From a semantics point-of-view, XML nodes can only contain other xml nodes (or plain text), having any other tsdoc tags within them is invalid.~~ BuiltInNodes validates this anyways so I would assume DocNodeContainer would suffice.

Finally in the case of plain text within an XML node (<foo>hello world!</foo>), should we introduce another DocXmlPlainText class to represent these children?

suneettipirneni avatar Aug 01 '22 13:08 suneettipirneni

Hey @suneettipirneni, that would be awesome to finally get this implemented! I haven't thought about it in a long time.

Would we still want to store the opening and closing delimiters for both the start and end tags within DocXmlElement or just store the start tag and end tag excerpts?

Hmm... I haven't thought about this in a long time. 🙂 I think the AST is designed with excerpts as the leaf nodes, so it would be something like this:

Sample input:

<a>hi<b /></a>

AST pseudocode:

- DocXmlElement
  * excerpts for: "<a>"
  - DocPlainText
    * excerpts for: "hi"
  - DocXmlElement
    * excerpts for: "<b />"
  * excerpts for: "</a>"

Note that "</a>" is an excerpt belonging to the root DocXmlElement.

An AST visitor would be interested in the meaningful nodes DocXmlElement, DocPlainText, etc, whereas the excerpts are extra coordinates used for only for specialized operations such as syntax highlighting, or surgically modifying a document while preserving its original formatting.

BuiltInNodes validates this anyways so I would assume DocNodeContainer would suffice.

Yes, it would be a DocNodeContainer with those constraints. A key idea of DocNodeContainer is that it's possible for a user to introduce custom nodes that adjust these constraints. Some examples here: api-documenter/src/nodes/CustomDocNodeKind.ts

Finally in the case of plain text within an XML node (<foo>hello world!</foo>), should we introduce another DocXmlPlainText class to represent these children?

Yes.

Also keep in mind that the AST is intended to represent syntax errors as well, faithfully enough that it can regenerate the original input. Consider this example:

/**
 * <a>b</c>
 */

Maybe it would get represented as:

- (parent)
  - DocXmlElement
    * excerpts for: "<a>"
  - DocPlainText
    * excerpts for: "b"
  - DocXmlElement
    * excerpts for: "</c>"

Here "b" becomes a sibling of <a> instead of a child, and </c> is a degenerate DocXmlElement representing only an orphaned closing tag. (Or maybe </c> is DocErrorText? It depends on how much of the structure is useful, for example maybe we still want to syntax highlight </c> correctly.)

A web HTML parser has quite sophisticated heuristics that would try to render a sensible web page from broken syntax, for example inferring </a> so that b can be a child of <a>. But TSDoc doesn't need to worry about that in my opinion. Our priority is to implement a behavior that is simple and predictable and intuitive. (We don't face the problem of cross-compatibility with Chrome/Firefox/Safari/IE/etc specifically because TSDoc intends to be a strict standard with well-defined grammar rules, more like XML than HTML.)

octogonz avatar Aug 02 '22 09:08 octogonz

Hi @octogonz, I made a draft pr for this. If you could give me any feedback on this initial implementation, it would be greatly appreciated 😄.

suneettipirneni avatar Aug 04 '22 16:08 suneettipirneni

Much appreciated! I apologize for the slow response -- I've been really busy this week. I will take a look over the weekend. Thanks for patience!

octogonz avatar Aug 04 '22 21:08 octogonz