builder icon indicating copy to clipboard operation
builder copied to clipboard

MarkdownView does not handle the self-closing part as expected.

Open Overu opened this issue 1 month ago • 3 comments

message:

<tutorial-course-abandon-dismissal />
I see you're back on your user page! Perfect - now we're ready to continue with our course on creating a new project.

Let's go back to the homepage first, then we'll learn how to create a project. Please click the <highlight-link target-id="1ci8_wBp" tip="Click to go to homepage">Logo link</highlight-link> in the top navigation bar.

Once we're on the homepage, we'll use the Project menu to create our new project!

manifests as: Image

Overu avatar Nov 17 '25 11:11 Overu

原因

经过排查,在调用mdast-util-from-markdownfromMarkdown后,上述的输入整体被识别为html_block

这个行为其实符合CommonMark规范。这不是一个解析错误,而是源自于MarkdownHTML空白行的语义规则。

稍微改变下输入:

例子1:
<tutorial-course-abandon-dismissal />

I see you're back on your user page! Perfect - now we're ready to continue with our course on creating a new project.

例子2:
<tutorial-course-abandon-dismissal />I see you're back on your user page! Perfect - now we're ready to continue with our course on creating a new project.

那结果是符合预期的。

例子1中,html_block把空白行作为结束标志,所以<tutorial-course-abandon-dismissal />html_block,第3行是paragraph

<html_block>&lt;tutorial-course-abandon-dismissal /&gt;</html_block>
<paragraph>
  <text>I see you</text>
  <text>'</text>
  <text>re back on your user page</text>
  <text>!</text>
  <text> Perfect - now we</text>
  <text>'</text>
  <text>re ready to continue with our course on creating a new project.</text>
</paragraph>

例子2中,无法识别的tag(或a、span等标签)被识别为html_inline,和剩余的部分放在一个paragraph

<paragraph>
  <html_inline>&lt;tutorial-course-abandon-dismissal /&gt;</html_inline>
  <text>I see you</text>
  <text>'</text>
  <text>re back on your user page</text>
  <text>!</text>
  <text> Perfect - now we</text>
  <text>'</text>
  <text>re ready to continue with our course on creating a new project.</text>
</paragraph>

回过头看这个问题,输入中<tutorial-course-abandon-dismissal />和下面内容之间没有空白行,所以无法被解析为html_blockhtml_inline,而是将整个内容放到html_block

<html_block>&lt;tutorial-course-abandon-dismissal /&gt;
I see you're back on your user page! Perfect - now we're ready to continue with our course on creating a new project.</html_block>

最终导致外层无法解析。

上述AST可以参考CommonMark playground

解决方案

  1. 扩展mdast,将custom element解析为html_inline
  2. prompt规范输出格式
  3. 前置处理,确保输入fromMarkdown前,对于上述有问题的情况增加空白行或增加 反斜杠转义空行
  4. 后置处理,操作fromMarkdown的返回结果

Overu avatar Nov 17 '25 15:11 Overu

建议是走

前置处理,确保输入fromMarkdown前,对于上述有问题的情况增加空白行或增加 反斜杠转义空行

这个处理方式

现在在这边应该已经有相关逻辑了,如 preprocessCustomRawComponentspreprocessIncompleteTags,应该往这边再加一个就好;不过我担心会影响到其他预期被 inline 的内容(比如 highlight-link),可能要也测一下;如果确实会影响,我们可能需要把 custom component 从现在的 custom / customRaw 再进一步细分为 customInline / customBlock / customRaw,然后只对 customBlock 做这个处理?

nighca avatar Nov 18 '25 03:11 nighca

目前先采用前置预处理,针对这种情况会增加\n\n,强制分为两行,考虑到只有self-closing tag加换行符后紧跟着字符会出现这个情况,暂时没有增加类似customInline

Overu avatar Nov 18 '25 13:11 Overu

mdast-util-from-markdown行为分析

开/闭合标签

测试基于以下几种场景:

  1. 只有单标签
<tag-name></tag-name>
  1. 标签后面跟着文字
<tag-name></tag-name>hello
  1. 标签后面跟着换行
<tag-name></tag-name>
hello
  1. 标签前后跟着文字
hello<tag-name></tag-name>world
  • 未知标签: inline

    1. [paragraph -> [tag-name]]
    2. [paragraph -> [tag-name, world]]
    3. [paragraph -> [tag-name, \nworld]]
    4. [paragraph -> [hello, tag-name, world]]
  • 已知标签

    • a、span:inline

      1. [paragraph -> [span]]
      2. [paragraph -> [span, world]]
      3. [paragraph -> [span, \nworld]]
      4. [paragraph -> [hello, span, world]]
    • div、p(raw): block

      1. [html -> [div]]
      2. [html -> [divworld]]
      3. [html -> [div\nworld]]
      4. [paragraph -> [hello, div, world]]

自闭合

测试基于以下几种场景:

  1. 只有单标签
<tag-name />
  1. 标签后面跟着文字
<tag-name />world
  1. 标签后面跟着换行
<tag-name />
hello
  1. 标签前后跟着文字
hello<tag-name />world
  • 未知标签: inline

    1. [html -> [tag-name/]]
    2. [paragraph -> [tag-name/, world]]
    3. [html -> [tag-nam/\nworld]]
    4. [paragraph -> [hello, tag-name/, world]]
  • 已知标签

    • a、span:inline

      1. [html -> [span/]]
      2. [paragraph -> [span/, world]]
      3. [html -> [span/\nworld]]
      4. [paragraph -> [hello, span/, world]]
    • div、p: block

      1. [html -> [div/]]
      2. [html -> [div/world]]
      3. [html -> [div/\nworld]]
      4. [paragraph -> [hello, div/, world]]

raw

测试基于以下几种场景:

  1. 前面无内容
<tag-name>
component1

component2
component3
</tag-name>
  1. 前面有内容
hello<tag-name>
component1

component2
component3
</tag-name>world
  1. 后面有内容
hello<tag-name>
component1

component2
component3
</tag-name>world
  • pre

    1. [html -> [<pre>\ncomponent1\n\ncomponent\n2component3\n]]
    2. [paragraph -> [hello, pre, ...], paragraph -> [..., world, /pre]]
    3. [html -> [<pre>\ncomponent1\n\ncomponent\n2component3\nworld]]
  • div

    1. [html, paragraph, html]
    2. [paragraph -> [hello, div, ...], paragraph -> [..., world, /div]]
    3. [html, paragraph, html]

Overu avatar Dec 01 '25 11:12 Overu