PaddleX
PaddleX copied to clipboard
请问一下block_label 类型说明
想请问一下用paddleocr-vl返回的版式类别哪里有文档说明每个类型的含义block_label 就是这个字段里面记录了类型
return {
"paragraph_title": format_title_func,
"abstract_title": format_title_func,
"reference_title": format_title_func,
"content_title": format_title_func,
"doc_title": lambda block: f"# {block.content}".replace("-\n", "").replace(
"\n", " "
),
"table_title": text_func,
"figure_title": text_func,
"chart_title": text_func,
"vision_footnote": lambda block: block.content.replace("\n\n", "\n").replace(
"\n", "\n\n"
),
"text": lambda block: block.content.replace("\n\n", "\n").replace("\n", "\n\n"),
"ocr": lambda block: block.content.replace("\n\n", "\n").replace("\n", "\n\n"),
"vertical_text": lambda block: block.content.replace("\n\n", "\n").replace(
"\n", "\n\n"
),
"reference_content": lambda block: block.content.replace("\n\n", "\n").replace(
"\n", "\n\n"
),
"abstract": partial(
format_first_line_func,
templates=["摘要", "abstract"],
format_func=lambda l: f"## {l}\n",
spliter=" ",
),
"content": lambda block: block.content.replace("-\n", " \n").replace(
"\n", " \n"
),
"image": image_func,
"chart": chart_func,
"formula": formula_func,
"display_formula": formula_func,
"inline_formula": formula_func,
"table": table_func,
"reference": partial(
format_first_line_func,
templates=["参考文献", "references"],
format_func=lambda l: f"## {l}",
spliter="\n",
),
"algorithm": lambda block: block.content.strip("\n"),
"seal": seal_func,
}
我看你们源码里面只处理了这些类型,可以给一个说明文档,说明每个类型是什么吗,因为我看还有header一些你们没有处理
同问
说明文档里没有详细的标签类别对照,但是配置文件中给出了,请参考: https://github.com/PaddlePaddle/PaddleX/blob/e0c509eef1b333e3a57545b04a47f7f701fadfb1/paddlex/configs/pipelines/PaddleOCR-VL.yaml#L19
说明文档里没有详细的标签类别对照,但是配置文件中给出了,请参考:
PaddleX/paddlex/configs/pipelines/PaddleOCR-VL.yaml
Line 19 in e0c509e
threshold:
我知道这里有这些类型,我的意思是有这些类型的中文说明吗,他们分别代表识别出来的在文档里面是什么类型的,要不要舍去之类的