PaddleX icon indicating copy to clipboard operation
PaddleX copied to clipboard

请问一下block_label 类型说明

Open Stefan3Zz opened this issue 2 months ago • 3 comments

想请问一下用paddleocr-vl返回的版式类别哪里有文档说明每个类型的含义block_label 就是这个字段里面记录了类型

return {
    "paragraph_title": format_title_func,
    "abstract_title": format_title_func,
    "reference_title": format_title_func,
    "content_title": format_title_func,
    "doc_title": lambda block: f"# {block.content}".replace("-\n", "").replace(
        "\n", " "
    ),
    "table_title": text_func,
    "figure_title": text_func,
    "chart_title": text_func,
    "vision_footnote": lambda block: block.content.replace("\n\n", "\n").replace(
        "\n", "\n\n"
    ),
    "text": lambda block: block.content.replace("\n\n", "\n").replace("\n", "\n\n"),
    "ocr": lambda block: block.content.replace("\n\n", "\n").replace("\n", "\n\n"),
    "vertical_text": lambda block: block.content.replace("\n\n", "\n").replace(
        "\n", "\n\n"
    ),
    "reference_content": lambda block: block.content.replace("\n\n", "\n").replace(
        "\n", "\n\n"
    ),
    "abstract": partial(
        format_first_line_func,
        templates=["摘要", "abstract"],
        format_func=lambda l: f"## {l}\n",
        spliter=" ",
    ),
    "content": lambda block: block.content.replace("-\n", "  \n").replace(
        "\n", "  \n"
    ),
    "image": image_func,
    "chart": chart_func,
    "formula": formula_func,
    "display_formula": formula_func,
    "inline_formula": formula_func,
    "table": table_func,
    "reference": partial(
        format_first_line_func,
        templates=["参考文献", "references"],
        format_func=lambda l: f"## {l}",
        spliter="\n",
    ),
    "algorithm": lambda block: block.content.strip("\n"),
    "seal": seal_func,
}

我看你们源码里面只处理了这些类型,可以给一个说明文档,说明每个类型是什么吗,因为我看还有header一些你们没有处理

Stefan3Zz avatar Oct 23 '25 04:10 Stefan3Zz

同问

Kerrycarry avatar Oct 27 '25 05:10 Kerrycarry

说明文档里没有详细的标签类别对照,但是配置文件中给出了,请参考: https://github.com/PaddlePaddle/PaddleX/blob/e0c509eef1b333e3a57545b04a47f7f701fadfb1/paddlex/configs/pipelines/PaddleOCR-VL.yaml#L19

leo-q8 avatar Oct 28 '25 03:10 leo-q8

说明文档里没有详细的标签类别对照,但是配置文件中给出了,请参考:

PaddleX/paddlex/configs/pipelines/PaddleOCR-VL.yaml

Line 19 in e0c509e

threshold:

我知道这里有这些类型,我的意思是有这些类型的中文说明吗,他们分别代表识别出来的在文档里面是什么类型的,要不要舍去之类的

Stefan3Zz avatar Nov 03 '25 02:11 Stefan3Zz