MarkdownView does not handle the self-closing part as expected.
message:
<tutorial-course-abandon-dismissal />
I see you're back on your user page! Perfect - now we're ready to continue with our course on creating a new project.
Let's go back to the homepage first, then we'll learn how to create a project. Please click the <highlight-link target-id="1ci8_wBp" tip="Click to go to homepage">Logo link</highlight-link> in the top navigation bar.
Once we're on the homepage, we'll use the Project menu to create our new project!
manifests as:
原因
经过排查,在调用mdast-util-from-markdown的fromMarkdown后,上述的输入整体被识别为html_block。
这个行为其实符合CommonMark规范。这不是一个解析错误,而是源自于Markdown对HTML空白行的语义规则。
稍微改变下输入:
例子1:
<tutorial-course-abandon-dismissal />
I see you're back on your user page! Perfect - now we're ready to continue with our course on creating a new project.
或
例子2:
<tutorial-course-abandon-dismissal />I see you're back on your user page! Perfect - now we're ready to continue with our course on creating a new project.
那结果是符合预期的。
例子1中,html_block把空白行作为结束标志,所以<tutorial-course-abandon-dismissal />是html_block,第3行是paragraph
<html_block><tutorial-course-abandon-dismissal /></html_block>
<paragraph>
<text>I see you</text>
<text>'</text>
<text>re back on your user page</text>
<text>!</text>
<text> Perfect - now we</text>
<text>'</text>
<text>re ready to continue with our course on creating a new project.</text>
</paragraph>
例子2中,无法识别的tag(或a、span等标签)被识别为html_inline,和剩余的部分放在一个paragraph中
<paragraph>
<html_inline><tutorial-course-abandon-dismissal /></html_inline>
<text>I see you</text>
<text>'</text>
<text>re back on your user page</text>
<text>!</text>
<text> Perfect - now we</text>
<text>'</text>
<text>re ready to continue with our course on creating a new project.</text>
</paragraph>
回过头看这个问题,输入中<tutorial-course-abandon-dismissal />和下面内容之间没有空白行,所以无法被解析为html_block或html_inline,而是将整个内容放到html_block中
<html_block><tutorial-course-abandon-dismissal />
I see you're back on your user page! Perfect - now we're ready to continue with our course on creating a new project.</html_block>
最终导致外层无法解析。
上述AST可以参考CommonMark playground
解决方案
- 扩展
mdast,将custom element解析为html_inline - prompt规范输出格式
- 前置处理,确保输入
fromMarkdown前,对于上述有问题的情况增加空白行或增加反斜杠转义空行 - 后置处理,操作
fromMarkdown的返回结果
建议是走
前置处理,确保输入fromMarkdown前,对于上述有问题的情况增加空白行或增加 反斜杠转义空行
这个处理方式
现在在这边应该已经有相关逻辑了,如 preprocessCustomRawComponents、preprocessIncompleteTags,应该往这边再加一个就好;不过我担心会影响到其他预期被 inline 的内容(比如 highlight-link),可能要也测一下;如果确实会影响,我们可能需要把 custom component 从现在的 custom / customRaw 再进一步细分为 customInline / customBlock / customRaw,然后只对 customBlock 做这个处理?
目前先采用前置预处理,针对这种情况会增加\n\n,强制分为两行,考虑到只有self-closing tag加换行符后紧跟着字符会出现这个情况,暂时没有增加类似customInline。
mdast-util-from-markdown行为分析
开/闭合标签
测试基于以下几种场景:
- 只有单标签
<tag-name></tag-name>
- 标签后面跟着文字
<tag-name></tag-name>hello
- 标签后面跟着换行
<tag-name></tag-name>
hello
- 标签前后跟着文字
hello<tag-name></tag-name>world
-
未知标签: inline
- [paragraph -> [tag-name]]
- [paragraph -> [tag-name, world]]
- [paragraph -> [tag-name, \nworld]]
- [paragraph -> [hello, tag-name, world]]
-
已知标签
-
a、span:inline
- [paragraph -> [span]]
- [paragraph -> [span, world]]
- [paragraph -> [span, \nworld]]
- [paragraph -> [hello, span, world]]
-
div、p(raw): block
- [html -> [div]]
- [html -> [divworld]]
- [html -> [div\nworld]]
- [paragraph -> [hello, div, world]]
-
自闭合
测试基于以下几种场景:
- 只有单标签
<tag-name />
- 标签后面跟着文字
<tag-name />world
- 标签后面跟着换行
<tag-name />
hello
- 标签前后跟着文字
hello<tag-name />world
-
未知标签: inline
- [html -> [tag-name/]]
- [paragraph -> [tag-name/, world]]
- [html -> [tag-nam/\nworld]]
- [paragraph -> [hello, tag-name/, world]]
-
已知标签
-
a、span:inline
- [html -> [span/]]
- [paragraph -> [span/, world]]
- [html -> [span/\nworld]]
- [paragraph -> [hello, span/, world]]
-
div、p: block
- [html -> [div/]]
- [html -> [div/world]]
- [html -> [div/\nworld]]
- [paragraph -> [hello, div/, world]]
-
raw
测试基于以下几种场景:
- 前面无内容
<tag-name>
component1
component2
component3
</tag-name>
- 前面有内容
hello<tag-name>
component1
component2
component3
</tag-name>world
- 后面有内容
hello<tag-name>
component1
component2
component3
</tag-name>world
-
pre
- [html -> [<pre>\ncomponent1\n\ncomponent\n2component3\n]]
- [paragraph -> [hello, pre, ...], paragraph -> [..., world, /pre]]
- [html -> [<pre>\ncomponent1\n\ncomponent\n2component3\nworld]]
-
div
- [html, paragraph, html]
- [paragraph -> [hello, div, ...], paragraph -> [..., world, /div]]
- [html, paragraph, html]