commonmark-java icon indicating copy to clipboard operation
commonmark-java copied to clipboard

When there is only one "\n", the table header cannot be recognized.

Open Yuanarcheannovice opened this issue 10 months ago • 2 comments

Steps to reproduce the problem (provide example Markdown if applicable):

" \n\n**aa**:  \n| ab   | bb     | cc     | dd     | ee     |\n|--------|----------|----------|----------|----------|\n| ab   | dff| dff| sf| sdfa|\n| sdf   | dfs| sdf| dsf| sdfs|\n| sdf   | sdf   | sdf       | ss   | d       |\n| ss   | dd   | ab       | bc   | 水       |\n"

Expected behavior:


Actual behavior:


It is in the Markdown style output by the AI.

(Also see what the reference implementation does: https://spec.commonmark.org/dingus/)

Yuanarcheannovice avatar Apr 01 '25 02:04 Yuanarcheannovice

A more minimal example of what this is about (I believe):

text
|a  |
|---|
|b  |

GitHub renders it as a paragraph and a table, commonmark-java just as a paragraph.

The difficult part here is that the table needs to interrupt the paragraph, but we only know it's a table by the third line (|---|).. Note that GitHub's spec about tables doesn't mention this case, but anyway. We should look into whether this can be supported. It might need changes in the extension APIs.

robinst avatar Jun 10 '25 23:06 robinst

public static class Factory extends AbstractBlockParserFactory {

    @Override
    public BlockStart tryStart(ParserState state, MatchedBlockParser matchedBlockParser) {
        CharSequence line = state.getLine();
        CharSequence paragraph = matchedBlockParser.getParagraphContent();
        if (paragraph != null && paragraph.toString().contains("|") && !paragraph.toString().contains("\n")) {
            CharSequence separatorLine = line.subSequence(state.getIndex(), line.length());
            List<TableCell.Alignment> columns = parseSeparator(separatorLine);
            if (columns != null && !columns.isEmpty()) {
                List<String> headerCells = split(paragraph);
                if (columns.size() >= headerCells.size()) {
                    return BlockStart.of(new TableBlockParser(columns, headerCells))
                            .atIndex(state.getIndex())
                            .replaceActiveBlockParser();
                }
            }
        }

        try {
            // 兼容下面这个格式,列表中有表格,表格前面只有一个换行符
            // * 列表第一行\n* 列表第二行\n| 名称 | 生卒年份 | 主要作品 |\n|------|----------|------|----------|\n| 李白 | 701-762 | 《静夜思》《将进酒》 |
            if (matchedBlockParser.getMatchedBlockParser() instanceof ListBlockParser) {
                CharSequence pbOrigin = ((ParagraphParser) (state.getActiveBlockParser())).getContentString();// 上一个ListItem中的数据
                if (pbOrigin != null && pbOrigin.toString().contains("\n")) {
                    String[] pbs = pbOrigin.toString().split("\n");
                    if (pbs.length == 2) {
                        String pb = pbs[pbs.length - 1];// 拿到表格头
                        CharSequence separatorLine = line.subSequence(state.getIndex(), line.length());
                        List<TableCell.Alignment> columns = parseSeparator(separatorLine);
                        if (columns != null && !columns.isEmpty()) {
                            List<String> headerCells = split(pb);
                            if (columns.size() >= headerCells.size()) {
                                StringBuilder pbSB = (StringBuilder) pbOrigin;
                                deleteStringEndText(pbSB, pb);// 删除上一个ListItem中的表格头
                                return BlockStart.of(new TableBlockParser(columns, headerCells))
                                        .atIndex(state.getIndex());
                            }
                        }
                    }
                }
            }
        } catch (Exception e) {
            e.printStackTrace();
        }

        return BlockStart.none();
    }
}

private static void deleteStringEndText(StringBuilder sb, CharSequence suffix){
    int len = suffix.length();
    if (sb.length() >= len && sb.substring(sb.length() - len).equals(suffix)) {
        sb.delete(sb.length() - len, sb.length());
    }
}

az4mxl avatar Jun 16 '25 14:06 az4mxl

Released in 0.25.0 now: https://github.com/commonmark/commonmark-java/releases/tag/commonmark-parent-0.25.0

robinst avatar Jun 20 '25 13:06 robinst