Xiaomeng Zhao
Xiaomeng Zhao
## Motivation Merge adjacent and overlapping detection boxes to optimize text region detection in the document. Post processing of text boxes is enhanced by consolidating them into larger text lines,...
fix(pdf-extract): adjust box threshold for OCR detection to fix issue about OCR mode lost some line
## Motivation Tuned the detection box threshold parameter in the OCR model initialization to improve the accuracy of text extraction from images. The threshold was modified from 0.6 to 0.3...
- Ensuring that table captions are properly included in the output. - Remove the redundant `table_caption` variable。 Thanks for your contribution and we appreciate it a lot. The following instructions...
## Motivation Optimize the language detection logic to enhance content formatting. This change addresses issues with long word segmentation. Language detection now uses a threshold to determine the language of...
## Motivation 1. change docker default config from cpu to cuda. 2. add model file download logic in docker build. 3. use ubuntu 22.04 instead of latest. 4. merge some...
## Motivation Add start_page_id and end_page_id arguments to various components of the PDF parsing pipeline to support pagination functionality. This feature allows users to specify the range of pages to...
Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily get feedback. If you do not understand...
### 🔎 Search before asking - [X] I have searched the PaddleOCR [Docs](https://paddlepaddle.github.io/PaddleOCR/) and found no similar bug report. - [X] I have searched the PaddleOCR [Issues](https://github.com/PaddlePaddle/PaddleOCR/issues) and found no...
近日收到多起反馈编译安装simsimd库失败的情况,经过验证,simsimd的新版本预编译包在x86-64平台linux上要求glibc版本在2.28及以上,根据收集到的数据,一些符合要求的系统版本为: 1. CentOS/RHEL CentOS 8 和 RHEL 8 及其后续版本 2. Ubuntu Ubuntu 20.04 LTS (Focal Fossa) 及其后续版本 3. Debian Debian 10 (Buster) 及其后续版本 4. Fedora Fedora 30 及其后续版本 如果系统版本较老,可先安装 ```...
由于YOLO官方库ultralytics在12月2日至12月7日,发布了四个包含病毒软件的版本,在比期间的用户通过安装magic-pdf也会受到影响,建议所有在此期间新安装的用户检查自己pip环境中的ultralytics库版本,如果是8.3.41 / 8.3.42 / 8.3.45 / 8.3.46 请尽快卸载并安装8.3.47及以上版本。 Due to the official YOLO repository ultralytics releasing four versions containing malicious software between 2024/12/02 ~ 2024/12/07 users who installed magic-pdf during...