PaddleOCR icon indicating copy to clipboard operation
PaddleOCR copied to clipboard

使用TableRecognitionPipelineV2表格检测识别,识别出来的文本和置信度长度不一致

Open wqw0806 opened this issue 1 month ago • 1 comments

🔎 Search before asking

  • [x] I have searched the PaddleOCR Docs and found no similar bug report.
  • [x] I have searched the PaddleOCR Issues and found no similar bug report.
  • [x] I have searched the PaddleOCR Discussions and found no similar bug report.

🐛 Bug (问题描述)

使用TableRecognitionPipelineV2表格检测识别,识别出来的文本和置信度长度不一致

🏃‍♂️ Environment (运行环境)

paddleocr3.3.0

🌰 Minimal Reproducible Example (最小可复现问题的Demo)

from paddleocr import TableRecognitionPipelineV2

pipeline = TableRecognitionPipelineV2(use_doc_orientation_classify=False, use_doc_unwarping=False)

output = pipeline.predict(r"test_imgs/03.png") for res in output: # res.print() ## 打印预测的结构化输出 res.save_to_img("./outputpptable/") res.save_to_xlsx("./outputpptable/") res.save_to_html("./outputpptable/") res.save_to_json("./outputpptable/")

ocr_res = [] for res in output: cell_ocr_result = res.get("table_res_list", []) # print(cell_ocr_result) for cell in cell_ocr_result: cell_box = cell.get("cell_box_list") # 这个是表格中单元格的所有框,arrary类型 len=92 cell_box_list = [[int(coord) for coord in array] for array in cell_box] # 转为int列表 结构为[[1,2,3,4], [1,2,3,4]]

    # cell_box = cell.get("table_ocr_pred").get("rec_boxes")    # 这个是表格中ocr文本的框
    cell_ocr = cell.get("table_ocr_pred").get("rec_texts")
    print(len(cell_ocr))  # 70 
    cell_score = cell.get("table_ocr_pred").get("rec_scores")
    print(len(cell_score))  # 67

为什么表格结果中置信度和文本长度不一致

wqw0806 avatar Nov 11 '25 10:11 wqw0806

Image对于这张图片,使用上述代码运行,识别出表格内容,返回值里的table_res_list里面的值,table_ocr_pred.rec_boxes的长度是69,table_ocr_pred.rec_texts是69,table_ocr_pred.rec_scores是64 为什么长度会不一致呢?也检查了空格的问题,文本中返回的空格内容置信度就是0.0,说明和空格应该是没关的

wqw0806 avatar Nov 12 '25 02:11 wqw0806