commonmark.js icon indicating copy to clipboard operation
commonmark.js copied to clipboard

Source information missing from parsed nodes

Open rix0rrr opened this issue 6 years ago • 2 comments

I'm making a renderer that will modify a tree in-place, but can roundtrip MarkDown back to the same MarkDown (or mostly the same).

There is some information missing from the parsed nodes that would be useful reconstructing the original document. In particular:

  • Whether a code_block was originally fenced or indented.
  • What symbol was used for emphasis, either * or _.

Can these be added to the parse nodes?

rix0rrr avatar Sep 06 '19 13:09 rix0rrr

I'm unsure about this, because this is just the tip of the iceberg. Was a character escaped or not in the source? How much indentation was used before a list item? How many backticks were used in the code block? Was a reference or inline link used? Etc. Once you start going down this path, you're looking at a very different kind of parser output, a concrete syntax tree rather than an abstract one.

jgm avatar Sep 06 '19 15:09 jgm

@rix0rrr

  • Node has a property _isFenced, but no marker char so can not determine marker is ` or ~
  • Emphasis has no marker char

You can add these on your fork. I wrote a parser with reference to commonmark.js, it added these 'meta info', you may take a look.

88250 avatar Sep 06 '19 15:09 88250