MinerU icon indicating copy to clipboard operation
MinerU copied to clipboard

新版本运行出现bug:IndexError: index 10 is out of bounds for axis 0 with size 10

Open Maple0709 opened this issue 1 year ago • 3 comments

Description of the bug | 错误描述

Traceback (most recent call last): File "/opt/conda/lib/python3.10/threading.py", line 1016, in _bootstrap_inner self.run() File "/opt/conda/lib/python3.10/threading.py", line 953, in run self._target(*self._args, **self._kwargs) File "/data/MinerU/app.py", line 61, in file_extract pipe.pipe_analyze(pdf_bytes, pdf_type) File "/data/MinerU/magic_pdf/pipe/UNIPipe.py", line 69, in pipe_analyze self.model_list = doc_analyze(pdf_bytes, self.ocr_custom_model, ocr=True,isimage=False, File "/data/MinerU/magic_pdf/model/doc_analyze_by_custom_model.py", line 136, in doc_analyze result = custom_model(img) File "/data/MinerU/magic_pdf/model/pdf_extract_kit.py", line 351, in call ocr_res = self.ocr_model.ocr(new_image, mfd_res=adjusted_mfdetrec_res)[0] File "/data/MinerU/magic_pdf/model/pek_sub_modules/self_modify.py", line 290, in ocr dt_boxes, rec_res, _ = self.call(img, cls, mfd_res=mfd_res) File "/data/MinerU/magic_pdf/model/pek_sub_modules/self_modify.py", line 371, in call rec_res, elapse = self.text_recognizer(img_crop_list) File "/opt/mineru_venv/lib/python3.10/site-packages/paddleocr/tools/infer/predict_rec.py", line 630, in call rec_res[indices[beg_img_no + rno]] = rec_result[rno] IndexError: index 10 is out of bounds for axis 0 with size 10

How to reproduce the bug | 如何复现

新版本中,使用多线程执行应用的时候,会出现IndexError: index 10 is out of bounds for axis 0 with size 10

Operating system | 操作系统

Linux

Python version | Python 版本

3.10

Software version | 软件版本 (magic-pdf --version)

0.8.x

Device mode | 设备模式

cuda

Maple0709 avatar Sep 18 '24 03:09 Maple0709

报错的pdf文件在单线程会触发这个问题吗?

myhloli avatar Sep 18 '24 03:09 myhloli

最近的版本修改了magic_pdf/model/pdf_extract_kit.py和magic_pdf/model/pek_sub_modules/self_modify.py的一些代码,看了你的报错,代码行数和最新的版本对不上,可以尝试更新到最新版本再进行测试

myhloli avatar Sep 18 '24 03:09 myhloli

我也碰到了这个问题,并发请求8个报错,并发请求4个不报错。 paddlepaddle-gpu 2.6.2 paddleocr 2.8.1

georgewangchn avatar Sep 23 '24 03:09 georgewangchn

paddleocr不支持多线程导致的,请尽量使用多进程而不是多线程来处理并发。

myhloli avatar Jan 05 '25 15:01 myhloli