PaddleOCR issues

Characters such as '-', '$', '€' not being detected by paddleocr

1

I am using the tablebank layout detector and the ocr model of paddleocr to detect tables in an image and extract the text in the detected table to a csv...

saanvib13

报告在 CPU 机器上使用 PaddleOCR CPU包, 2小时后遇到内存泄漏问题

29

__描述：__ 在使用 PaddleOCR 包2小时后，遇到了内存泄漏问题，这是在 CPU 机器上。尽管在循环中处理图像，但 OCR 的内存使用量持续增加，没有任何内存释放，最终导致内存耗尽。 __环境：__ PaddleOCR 版本：2.73 Python 版本：3.11.8 操作系统：Window 10 + 安装以下包： paddlepaddle==2.6.1 -i https://pypi.tuna.tsinghua.edu.cn/simpleol paddleocr==2.73 hanzidentifier==1.1.0 pillow==10.3.0 fastapi[all]==0.110.3 __期望行为：__ OCR 处理期间内存使用量应保持稳定或逐渐增加，但应在处理每个图像后释放，以防止内存耗尽。

Copng-py

bug

Integration to ocrmypdf

2

PaddleOCR seems to be very nice way to OCR documents. There is project called ocrmypdf https://github.com/ocrmypdf/OCRmyPDF which has plugin system, where HOCR -compliant OCR engines can be integrated (it is...

savikko

Code PR is needed

新增生僻字模型

5

## 背景经过需求征集https://github.com/PaddlePaddle/PaddleOCR/issues/10334 和每周技术研讨会 https://github.com/PaddlePaddle/PaddleOCR/issues/10223 讨论，我们确定了新增生僻字模型的任务。 ## 解决步骤 1. 替换现有字典txt为扩充《通用规范汉字表》的字典。 2. 在现有数据集上通过数据合成copy paste等方式实现语料的平衡，并重新训练PPOCRV3的检测和识别模型。 3. 对比训练后模型在普通文字和生僻字上的检测、识别精度，并和PPOCRV3模型最优模型进行对比；达到普通字精度不变或者更高，生僻字上精度进一步提升的效果。 5. 提交PR到ppocr，替换最优模型。

shiyutang

TritonModelException: inference request batch-size must be <= 128 for 'cls_pp'

3

- 系统环境/System Environment： - 版本号/Version：Paddle： - PaddleOCR：问题相关组件/Related components： - 运行指令/Command Code： ``` FROM registry.baidubce.com/paddlepaddle/fastdeploy:1.0.7-gpu-cuda11.4-trt8.5-21.10 COPY ./models-gpu.tar /ocr_serving/ WORKDIR /ocr_serving RUN tar -xf models-gpu.tar RUN rm models-gpu.tar EXPOSE 8000 CMD...

sheiy

IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed

3

Hi, I am trying to run the following code: ``` python3 table/predict_table.py --image_dir=/scratch/rrs99/PaddleOCR/ppstructure/page_4.jpg \ --det_limit_side_len=736 \ --rec_model_dir=/scratch/rrs99/PaddleOCR/ppstructure/inference/en_ppocr_mobile_v2.0_table_rec_infer \ --table_model_dir=/scratch/rrs99/PaddleOCR/ppstructure/inference/en_ppocr_mobile_v2.0_table_structure_infer \ --det_model_dir=/scratch/rrs99/PaddleOCR/ppstructure/inference/en_ppocr_mobile_v2.0_table_det_infer \ --rec_char_dict_path=/scratch/rrs99/PaddleOCR/ppocr/utils/dict/table_dict.txt \ --table_char_dict_path=/scratch/rrs99/PaddleOCR/ppocr/utils/dict/table_structure_dict.txt \ --det_limit_type=min \ --output=/scratch/rrs99/PaddleOCR/ppstructure/output/table ```...

rudra0713

AttributeError: 'bool' object has no attribute 'sum' when `use_gpu: False`

2

请提供下述完整信息以便快速定位问题/Please provide the following information to quickly locate the problem - 系统环境/System Environment：Python 3.12 - 版本号/Version - Paddle：2.6.1 - PaddleOCR：2.7.3 - PaddleNLP: 2.6.1 (also tried 2.8.0, 2.5.x, 2.7.x) - 问题相关组件/Related...

ltbd78

bug

(InvalidArgument) Broadcast dimension mismatch. Operands could not be broadcast together with the shape of X = [1, 256, 30, 16] and the shape of Y = [1, 256, 30, 15]. Received [16] in X is not equal to [15] in Y at i:3. [Hint: Expected x_dims_array[i] == y_dims_array[i] || x_dims_array[i] <= 1 || y_dims_array[i] <= 1 == true, but received x_dims_array[i] == y_dims_array[i] || x_dims_array[i] <= 1 || y_dims_array[i] <= 1:0 != true:1.] (at ..\paddle/phi/kernels/funcs/common_shape.h:86) [operator < elementwise_add > error]

1

请提供下述完整信息以便快速定位问题/Please provide the following information to quickly locate the problem - 系统环境/System Environment： - 版本号/Version：Paddle： PaddleOCR：问题相关组件/Related components： - 运行指令/Command Code： ``` Global: use_gpu: false epoch_num: 100 log_smooth_window: 20 print_batch_step:...

piarosebelledelapaz

PaddleOCR 使用其他本地模型仅推理是否需要指定配置文件？

2

根据 https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/doc/doc_ch/models_list.md 路径，里面提到配置文件比如：ch_PP-OCRv4_server_det模型的配置文件为：https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/configs/det/ch_PP-OCRv4/ch_PP-OCRv4_det_teacher.yml 而实际官方给于的推理脚本中，没有提到需要指定配置文件： ```shell python3 predict_system.py \ --image_dir=./docs/table/1.png \ --det_model_dir=inference/en_PP-OCRv3_det_infer \ --rec_model_dir=inference/en_PP-OCRv3_rec_infer \ --rec_char_dict_path=../ppocr/utils/en_dict.txt \ --table_model_dir=inference/en_ppstructure_mobile_v2.0_SLANet_infer \ --table_char_dict_path=../ppocr/utils/dict/table_structure_dict.txt \ --layout_model_dir=inference/picodet_lcnet_x1_0_fgd_layout_infer \ --layout_dict_path=../ppocr/utils/dict/layout_dict/layout_publaynet_dict.txt \ --vis_font_path=../doc/fonts/simfang.ttf \ --recovery=True \ --output=../output/...

thejiangcj

使用ch_PP-OCRv4_rec_distill.yml 训练时SimpleDataSet 重复报错 KeyError: 'valid_ratio'

4

请提供下述完整信息以便快速定位问题/Please provide the following information to quickly locate the problem - 系统环境/System Environment：windows 10 - 版本号/Version：Paddle： 2.4.2.post116 PaddleOCR： 2.7.3问题相关组件/Related components： - 运行指令/Command Code： - 完整报错/Complete Error Message： [2024/05/22 14:06:59] ppocr...

zhenhuamo

bug

PaddleOCR
PaddleOCR copied to clipboard

Metadata

Characters such as '-', '$', '€' not being detected by paddleocr

报告在 CPU 机器上使用 PaddleOCR CPU包, 2小时后遇到内存泄漏问题

Integration to ocrmypdf

新增生僻字模型

TritonModelException: inference request batch-size must be <= 128 for 'cls_pp'

IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed

AttributeError: 'bool' object has no attribute 'sum' when `use_gpu: False`

PaddleOCR 使用其他本地模型仅推理是否需要指定配置文件？

使用ch_PP-OCRv4_rec_distill.yml 训练时SimpleDataSet 重复报错 KeyError: 'valid_ratio'

← Metadata

Owner

Metadata

PaddleOCR PaddleOCR copied to clipboard

Metadata

← Metadata

Owner

Metadata

PaddleOCR
PaddleOCR copied to clipboard