parse5
                                
                                
                                
                                    parse5 copied to clipboard
                            
                            
                            
                        `TreeAdapter` interface types are unsound
summary
The current API allows customizing parsing behavior using the TreeAdapter and TreeAdapterTypeMap interfaces, which I think is very cool. However the type declarations on these interfaces are overly relaxed; in particular, the parsing logic implicitly expects certain node types to be related to each other (e.g., every element is a parentNode), but these relationships are not at all enforced by the TreeAdapter type declarations. Consequently, the types of arguments passed by the parse5 internal parsing code to the custom TreeAdapter may have completely inconsistent types with those expected by the TreeAdapter's own type declarations.
extreme example
To illustrate my point with an extreme example, consider the following, which is semantically not a faithful/good tree adapter but nevertheless type-checks without error (on parse5 version 7.1.2, TS 5.4.5). In particular, each "node type" is just a literal string type.
import * as P5 from "parse5";
const minimalAdapter: P5.TreeAdapter<{
  childNode: "child";
  commentNode: "comment";
  document: "document";
  documentFragment: "fragment";
  documentType: "doctype";
  element: `element/${string}`;
  node: "comment" | "text" | "doctype" | `element/${string}`;
  parentNode: "parent";
  template: "template";
  textNode: "text";
}> = {
  adoptAttributes() {},
  appendChild() {},
  createCommentNode() {
    return "comment";
  },
  createDocument() {
    return "document";
  },
  createDocumentFragment() {
    return "fragment";
  },
  createElement(tag) {
    console.log("create element", tag);
    return `element/${tag}`;
  },
  // `createTextNode` is part of the interface in parse5 master branch
  // but not yet in 7.1.2 release, which is what i'm testing on
  //   createTextNode(value: string){ return "text"; },
  detachNode() {},
  getAttrList() {
    return [];
  },
  getChildNodes() {
    return [];
  },
  getCommentNodeContent() {
    return "(commentContent)";
  },
  getDocumentMode() {
    return P5.html.DOCUMENT_MODE.NO_QUIRKS;
  },
  getDocumentTypeNodeName() {
    return "(doctypeName)";
  },
  getDocumentTypeNodePublicId() {
    return "(docTypePublicId)";
  },
  getDocumentTypeNodeSystemId() {
    return "(docTypeSystemId)";
  },
  getFirstChild() {
    return null;
  },
  getNamespaceURI() {
    return P5.html.NS.HTML;
  },
  getNodeSourceCodeLocation() {
    return null;
  },
  getParentNode() {
    return null;
  },
  getTagName() {
    return "(tagName)";
  },
  getTemplateContent() {
    return "fragment";
  },
  getTextNodeContent() {
    return "(textContent)";
  },
  insertBefore() {},
  insertText(parent: "parent", text) {
    console.log("insert text:", { parent, text });
  },
  insertTextBefore() {},
  isCommentNode(node): node is "comment" {
    return node === "comment";
  },
  isDocumentTypeNode(node): node is "doctype" {
    return node === "doctype";
  },
  isElementNode(node): node is `element/${string}` {
    return node.startsWith("element/");
  },
  isTextNode(node): node is "text" {
    return node === "text";
  },
  setDocumentMode() {},
  setDocumentType() {},
  setNodeSourceCodeLocation() {},
  setTemplateContent() {},
  updateNodeSourceCodeLocation() {},
};
P5.parseFragment("foobar", { treeAdapter: minimalAdapter });
extreme example explanation
Pay attention specifically to:
- the 
parentNodetype (which is just the literal string type,"parent") - the 
insertTextfunction, which per the official parse5 documentation takes in two arguments:
in this case,insertText(parentNode: T["parentNode"], text: string)T["parentNode"]is, per the last bullet point, the literal string type"parent". In the code example, we explicitly annotate this:insertText(parent: "parent", text) 
However, running the code results in the following output:
insert text: { parent: 'element/html', text: 'foobar' }
which is inconsistent with the type declarations, even though there is no type-checking error. This happens because the actual way that this parsing behind the scenes looks something like this (not exactly; I'm omitting some details to keep this discussion a little simpler, but I'm pretty sure the basic idea is right):
- First call 
createElementto create a "mock"htmlelement. - Then call 
insertTextwith the element we just created, and the text"foobar"(from the input). 
The problem here is that createElement produces a node of type element, while insertText expects a node of type parentNode, but there is no requirement that all elements are actually parentNodes (in technical lingo, element needs to be a subtype of a.k.a. extend parentNode), and indeed, that is the requirement we constructed this example to violate.
related problems
The example above showcases just one of many unspoken type requirements. There are probably many, but here are just a few more I can think of:
templates are created usingcreateElementas well, but then passed intosetTemplateContent, which expects its input to be typetemplate. By the reasoning I gave above this would require that allelements aretemplates, which doesn't make sense, so something else needs to be fixed to address this type-soundness issue besides merely imposing constraints on the types.documents are alsoparentNodes, I think- similar issues with 
childNodeandappendChild 
who cares?
You could argue that my example is pathological and unrealistic, and that's true. Indeed, most of these unspoken constraints are satisfied just fine by the default, built-in tree adapter, so for most people these type unsoundness errors are moot. Still, I think that if we're going to go to these lengths (custom tree adapters & adapter types) to make parse5 customizable anyway, then we might as well do it right, and currently the types are not right.
how do we fix it?
I'm not totally sure! Some of the issues are partially addressed by adding extends constraints to ensure that, e.g., Element extends ParentNode. However, adding these constraints to TreeAdapterTypeMap's type parameters is insufficient because this way they can be bypassed simply by directly constructing the type map via a literal, which is already what my example above does. So to really enforce these constraints they have to occur directly on TreeAdapter itself, which would require a very annoying rewrite that lifts all the node type parameters directly onto the TreeAdaptertype itself. That would make the code a lot more verbose and break backwards-compatibility.
On top of that, it also doesn't address the issue of templates, which arise from technically unsound use of createElement to create a template node type, given specific input arguments. Simply constraining Element extends Template can address the type errors but doesn't make sense and conceptually isn't what we want; the more correct fix would be instead to have a completely separate createTemplateNode method and use that in the parsing logic instead when constructing templates. Again, API-breaking change.
Anyway, I don't have a simple fix that doesn't break stability in lots of ways, but I'm raising this issue anyway for discussion because there is a real typing issue, and I do care a lot about types. If we ultimately don't implement/enforce these type constraints/requirements in TypeScript because of stability or other technical reasons, I think they deserve at least a mention in the documentation so other folks trying to write their own TreeAdapter know what compatibility requirements they need in order to avoid running into unexpected type errors. :)