Does the dx-spec data model need to inherit from unist/mdast?
So: the idea to be a Commonmark superset still seems very good to me. But it raises a question that has also been relevant to documentation.js: if you're parsing Markdown in your documentation.js parser, what does your data model / JSON output look like?
For documentation.js, my choice was mdast, the Markdown abstract syntax tree that remark reads and writes. That meant, though, that any tool that interprets documentation.js's output needs to know how to transform MDAST into HTML or Markdown or whatever.
Thoughts here:
- remark isn't the most popular solution on the block - markdown-it or remarkable or commonmark.js seem to be.
markedis out of the running because it's not CommonMark-compliant at all. But remark is the only tool I've found that focuses on the AST and tooling layer first and foremost, and which has a good, understandable AST. - remark isn't 100% commonmark-compliant https://github.com/wooorm/remark/issues/306 - but it's pretty darn close. It at least tries to be, as opposed to marked, which is opinionated about not being CommonMark-compliant
- We could just output Markdown strings in our JSON output, but in that case, you push the requirement for, for instance, #19 cross-references to work, down the line to the output tool.
I'm not sure which way to go here. Adding an AST into the output feels wrong, in that it's bulky and opinionated, but adding strings feels like it defers too much responsibility to the output layer.
cc @wooorm, king of the unified kingdom
We could just output Markdown strings in our JSON output, but in that case, you push the requirement for, for instance, #19 cross-references to work, down the line to the output tool
If this was done in a standardized fashion (i.e. maybe with a custom URL scheme for cross-reference link URLs, dx://...), then a downstream tool could choose to parse a markdown into an AST if it was useful for its purposes, but wouldn't necessarily have to.
I don't think that dx should return an AST at all. I think it should return an object like this:
{
raw: "...# full _markdown_...",
description: "...# partial _markdown_...",
tags: [/* ...extracted built-in tags... */],
customTags: [/* ...extracted custom tags... */],
source: { filename: '...', loc: { start: { line, column }, end: { line, column } } },
// ...what else?...
}
People can render the markdown bits using their own commonmark-compatible compilers.
I think we should allow ourselves to "preprocess" that markdown to handle cross-references (#19) and such.
I think we should allow ourselves to "preprocess" that markdown to handle cross-references (#19) and such.
For instance, this is one of the main reasons why documentation.js did return an AST - cross-references. If your final output is, for instance, Markdown in a README, you want to process cross-references into the id slugs that GitHub will generate for titles (or that you'll generate for a anchors). If your final output is a complex HTML theme, the URL of a chunk of documentation might be completely different - it might be /methods/foo or /#foo or /?method=foo depending on your framework or API URL preferences. It's tricky for the intermediate documentation output to really tailor this without very explicit configuration.
I'm not sure we can do much other than have it be configured ahead of time. That seems way easier to deal with than handling an AST.
For third-party modules I would link it to some service we can put up that permalinks documentation.
docs.js.org/npm/{name}/{version}/{...ref}
Which would pull from a docs.json file published to the npm registry.
cc @wooorm, king of the unified kingdom
Hi Tom et al! 👋
This is the first time I hear of dx, and it’s a bit to take in, so I’m not entirely sure how useful my thoughts are: they may be completely wrong.
remark isn't the most popular solution on the block - markdown-it or remarkable or commonmark.js seem to be. [...]
When I read that I was sure you were right, but then I looked at download counts. It seems remark is 3rd most popular, after marked and markdown-it, by this metric. remark has more downloads than remarkable and commonmark.js!
If you want to markdown -> HTML, go for markdown-it. If you need the AST for inspection or transforms, choose remark.
remark isn't 100% commonmark-compliant wooorm/remark#306 - but it's pretty darn close. [...]
Most of my thoughts are in that issue. To add though: commonmark is the best to base a spec on, because it actually is a markdown spec. But it’s also unstable, so you’ll deal with changes in dx through it anyway, until commonmark reaches a semver major.
[...] Adding an AST into the output feels wrong, in that it's bulky and opinionated, but adding strings feels like it defers too much responsibility to the output layer.
I get that. Here are a few follow up Qs: Are people supposed to customise the output? What plugins will the ecosystem create? How likely is it that remark is used internally to parse examples, tags, titles, descriptions, etc? What’s the output of dx supposed to be: HTML, markdown, something else?
For instance, this is one of the main reasons why documentation.js did return an AST - cross-references
@tmcw Are there any other examples like this besides cross-references? I can't think of any. IMO, if it's really only (or mostly) about cross-references, then an AST is probably overkill, and I'd advocate for something like a custom URI scheme like I mentioned above -- it's really not that hard to preprocess a chunk of markdown, replacing dx:// URIs with the appropriate thing (e.g. GH slugs or URLs to your docs site). On the other hand, if there are several other types of preprocessing that would also be required of most dx consumers, then an AST makes more sense than rendered markdown