使用TableRecognitionPipelineV2表格检测识别，识别出来的文本和置信度长度不一致

Open wqw0806 opened this issue 1 month ago • 1 comments

🔎 Search before asking

[x] I have searched the PaddleOCR Docs and found no similar bug report.
[x] I have searched the PaddleOCR Issues and found no similar bug report.
[x] I have searched the PaddleOCR Discussions and found no similar bug report.

🐛 Bug (问题描述)

🏃‍♂️ Environment (运行环境)

paddleocr3.3.0

🌰 Minimal Reproducible Example (最小可复现问题的Demo)

from paddleocr import TableRecognitionPipelineV2

pipeline = TableRecognitionPipelineV2(use_doc_orientation_classify=False, use_doc_unwarping=False)

output = pipeline.predict(r"test_imgs/03.png") for res in output: # res.print() ## 打印预测的结构化输出 res.save_to_img("./outputpptable/") res.save_to_xlsx("./outputpptable/") res.save_to_html("./outputpptable/") res.save_to_json("./outputpptable/")

ocr_res = [] for res in output: cell_ocr_result = res.get("table_res_list", []) # print(cell_ocr_result) for cell in cell_ocr_result: cell_box = cell.get("cell_box_list") # 这个是表格中单元格的所有框，arrary类型 len=92 cell_box_list = [[int(coord) for coord in array] for array in cell_box] # 转为int列表结构为[[1,2,3,4], [1,2,3,4]]

    # cell_box = cell.get("table_ocr_pred").get("rec_boxes")    # 这个是表格中ocr文本的框
    cell_ocr = cell.get("table_ocr_pred").get("rec_texts")
    print(len(cell_ocr))  # 70 
    cell_score = cell.get("table_ocr_pred").get("rec_scores")
    print(len(cell_score))  # 67

为什么表格结果中置信度和文本长度不一致

Nov 11 '25 10:11 wqw0806

对于这张图片，使用上述代码运行，识别出表格内容，返回值里的table_res_list里面的值，table_ocr_pred.rec_boxes的长度是69，table_ocr_pred.rec_texts是69，table_ocr_pred.rec_scores是64 为什么长度会不一致呢？也检查了空格的问题，文本中返回的空格内容置信度就是0.0，说明和空格应该是没关的

Nov 12 '25 02:11 wqw0806