使用TableRecognitionPipelineV2表格检测识别,识别出来的文本和置信度长度不一致
🔎 Search before asking
- [x] I have searched the PaddleOCR Docs and found no similar bug report.
- [x] I have searched the PaddleOCR Issues and found no similar bug report.
- [x] I have searched the PaddleOCR Discussions and found no similar bug report.
🐛 Bug (问题描述)
使用TableRecognitionPipelineV2表格检测识别,识别出来的文本和置信度长度不一致
🏃♂️ Environment (运行环境)
paddleocr3.3.0
🌰 Minimal Reproducible Example (最小可复现问题的Demo)
from paddleocr import TableRecognitionPipelineV2
pipeline = TableRecognitionPipelineV2(use_doc_orientation_classify=False, use_doc_unwarping=False)
output = pipeline.predict(r"test_imgs/03.png") for res in output: # res.print() ## 打印预测的结构化输出 res.save_to_img("./outputpptable/") res.save_to_xlsx("./outputpptable/") res.save_to_html("./outputpptable/") res.save_to_json("./outputpptable/")
ocr_res = [] for res in output: cell_ocr_result = res.get("table_res_list", []) # print(cell_ocr_result) for cell in cell_ocr_result: cell_box = cell.get("cell_box_list") # 这个是表格中单元格的所有框,arrary类型 len=92 cell_box_list = [[int(coord) for coord in array] for array in cell_box] # 转为int列表 结构为[[1,2,3,4], [1,2,3,4]]
# cell_box = cell.get("table_ocr_pred").get("rec_boxes") # 这个是表格中ocr文本的框
cell_ocr = cell.get("table_ocr_pred").get("rec_texts")
print(len(cell_ocr)) # 70
cell_score = cell.get("table_ocr_pred").get("rec_scores")
print(len(cell_score)) # 67
为什么表格结果中置信度和文本长度不一致
对于这张图片,使用上述代码运行,识别出表格内容,返回值里的table_res_list里面的值,table_ocr_pred.rec_boxes的长度是69,table_ocr_pred.rec_texts是69,table_ocr_pred.rec_scores是64 为什么长度会不一致呢?也检查了空格的问题,文本中返回的空格内容置信度就是0.0,说明和空格应该是没关的