`Bus error` when using `korean` language with `paddleocr` on macOS CLI
๐ Search before asking
- [x] I have searched the PaddleOCR Docs and found no similar bug report.
- [x] I have searched the PaddleOCR Issues and found no similar bug report.
- [x] I have searched the PaddleOCR Discussions and found no similar bug report.
๐ Bug (้ฎ้ขๆ่ฟฐ)
Description:
I am encountering a bus error when attempting to perform OCR with the korean language model using paddleocr on my macOS command-line interface. The fr (French) language model works correctly without any issues.
Steps to Reproduce:
-
Successful Case (French Language - works as expected):
paddleocr ocr -i https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_ocr_french01.png \ --lang fr \ --use_doc_orientation_classify False \ --use_doc_unwarping False \ --use_textline_orientation False \ --save_path ./output \ --device gpu:0Output (Excerpt): (Successful OCR output, including
resdictionary with recognized text.)/opt/anaconda3/lib/python3.12/site-packages/paddle/utils/cpp_extension/extension_utils.py:715: UserWarning: No ccache found. Please be aware that recompiling all source files may be required. You can download and install ccache from: https://github.com/ccache/ccache/blob/master/doc/INSTALL.md warnings.warn(warning_message) Creating model: ('PP-OCRv5_server_det', None) Using official model (PP-OCRv5_server_det), the model files will be automatically downloaded and saved in /Users/kimuj5090/.paddlex/official_models. Fetching 6 files: 100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 6/6 [00:00<00:00, 2728.59it/s] E0715 22:09:09.284237 233791232 analysis_config.cc:169] Please use PaddlePaddle with GPU version. Creating model: ('latin_PP-OCRv5_mobile_rec', None) Using official model (latin_PP-OCRv5_mobile_rec), the model files will be automatically downloaded and saved in /Users/kimuj5090/.paddlex/official_models. Fetching 6 files: 100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 6/6 [00:00<00:00, 1406.15it/s] E0715 22:09:09.653545 233791232 analysis_config.cc:169] Please use PaddlePaddle with GPU version. Connecting to https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_ocr_french01.png ... Downloading general_ocr_french01.png ... [==================================================] 100.00% [2025/07/15 22:09:15] paddleocr INFO: Processed item 0 in 6256.869316101074 ms {'res': {'input_path': '/Users/kimuj5090/.paddlex/predict_input/general_ocr_french01.png', 'page_index': None, 'model_settings': {'use_doc_preprocessor': True, 'use_textline_orientation': False}, 'doc_preprocessor_res': {'input_path': None, 'page_index': None, 'model_settings': {'use_doc_orientation_classify': False, 'use_doc_unwarping': False}, 'angle': -1}, 'dt_polys': array([[[119, 23], ..., [118, 75]], ..., [[109, 506], ..., [108, 556]]], dtype=int16), 'text_det_params': {'limit_side_len': 64, 'limit_type': 'min', 'thresh': 0.3, 'max_side_limit': 4000, 'box_thresh': 0.6, 'unclip_ratio': 1.5}, 'text_type': 'general', 'textline_orientation_angles': array([-1, ..., -1]), 'text_rec_score_thresh': 0.0, 'rec_texts': ['mifere; la profpรฉritรฉ & les fuccรจs ac-', 'compagnent lโhomme induftrieux.', 'Quel eft celui qui a acquis des ri-', 'cheffes, qui eft devenu puiffant, qui', 'sโeft couvert de gloire, dont lโรฉloge', 'retentit par-tout, qui fiege au confeil', "du Roi? C'eft celui qui bannit la pa-", "reffe de fa maifon, & qui a dit ร l'oifi-", 'vetรฉ : tu es mon ennemie.'], 'rec_scores': array([0.98409891, ..., 0.98091096]), 'rec_polys': array([[[119, 23], ..., [118, 75]], ..., [[109, 506], ..., [108, 556]]], dtype=int16), 'rec_boxes': array([[118, ..., 81], ..., [108, ..., 562]], dtype=int16)}} -
Failing Case (Korean Language - results in
bus error):paddleocr ocr -i 01_original.png \ --lang korean \ --use_doc_orientation_classify False \ --use_doc_unwarping False \ --use_textline_orientation False \ --save_path ./output \ --device gpu:0Output (Excerpt):
/opt/anaconda3/lib/python3.12/site-packages/paddle/utils/cpp_extension/extension_utils.py:715: UserWarning: No ccache found. Please be aware that recompiling all source files may be required. You can download and install ccache from: https://github.com/ccache/ccache/blob/master/doc/INSTALL.md warnings.warn(warning_message) Creating model: ('PP-OCRv5_server_det', None) Using official model (PP-OCRv5_server_det), the model files will be automatically downloaded and saved in /Users/kimuj5090/.paddlex/official_models. Fetching 6 files: 100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 6/6 [00:00<00:00, 2502.07it/s] E0715 22:10:31.702410 233791232 analysis_config.cc:169] Please use PaddlePaddle with GPU version. Creating model: ('korean_PP-OCRv5_mobile_rec', None) Using official model (korean_PP-OCRv5_mobile_rec), the model files will be automatically downloaded and saved in /Users/kimuj5090/.paddlex/official_models. Fetching 6 files: 100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 6/6 [00:00<00:00, 1404.19it/s] E0715 22:10:32.201907 233791232 analysis_config.cc:169] Please use PaddlePaddle with GPU version. zsh: bus error paddleocr ocr -i 01_original.png --lang korean --use_doc_orientation_classify /opt/anaconda3/lib/python3.12/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d 'Note:
01_original.pngis a local image file writtend by korean.
Expected And Actual:
When using --lang korean, the command should execute successfully and produce OCR results for the Korean text, similar to how it functions for the French language.
But the process terminates with a bus error when --lang korean is specified.
My Thinks:
I think it is possible that there's a memory management issue specific to the Korean language model or its dependencies on macOS, leading to the bus error.
๐โโ๏ธ Environment (่ฟ่ก็ฏๅข)
- Operating System: macOS (cli)
- PaddlePaddle Version:
3.1.0(compiled withwith_gpu: OFF) - PaddleOCR Version:
3.1.0 - Python Version:
3.12(from Anaconda) - GPU: Not used (as
with_gpu: OFFin PaddlePaddle compilation)
๐ฐ Minimal Reproducible Example (ๆๅฐๅฏๅค็ฐ้ฎ้ข็Demo)
```bash
paddleocr ocr -i 01_original.png \
--lang korean \
--use_doc_orientation_classify False \
--use_doc_unwarping False \
--use_textline_orientation False \
--save_path ./output \
--device gpu:0
```
Note: 01_original.png is a local image file writtend by korean.
+1
Getting the same error on
paddleocr ocr -i ./general_formula_recognition_001.png --use_doc_orientation_classify False --use_doc_unwarping False --use_textline_orientation False
[1] 76346 bus error paddleocr ocr -i .junk/test/2310.06825v1_page-0006.jpg False False False /Users/hash/.local/share/uv/python/cpython-3.13.3-macos-aarch64-none/lib/python3.13/multiprocessing/resource_tracker.py:301: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown: {'/loky-76346-fxjfm6qz'}
Operating System: macOS sonoma PaddlePaddle Version: 3.1.0 PaddleOCR Version: 3.1.0 Python Version: 3.13
But this example works somehow!
paddleocr ocr -i https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_ocr_french01.png \
--lang fr \
--use_doc_orientation_classify False \
--use_doc_unwarping False \
--use_textline_orientation False \
--save_path ./output \
--device gpu:0
Further my analysis goes, it works on small images. here : https://drive.google.com/file/d/11ULHdDxbJ2IxGa9WoX3Pby_NbsDrh_lb/view?usp=sharing
@TingquanGao what do you think ?
Thank you for your feedback! We will investigate the issue as soon as possible. To help us pinpoint the problem, could you please provide the image that triggered the error? Additionally, please let us know whether your Mac is equipped with an Apple Silicon (M series) chip or an Intel CPU. This information will help us reproduce and resolve the issue more efficiently. Thank you!
ๆไน้ๅฐไบๅๆ ท็้่ฏฏใ
่ฎพๅค๏ผMac mini M4 ็ๆฌ๏ผpaddleocr==3.2.0๏ผpaddlepaddle==3.1.1 ไฝฟ็จๆจกๅ๏ผen_PP-OCRv5_mobile_rec
ๆฑๆญๅพ็ไธๆนไพฟๅไบซ๏ผๆฏๅ ้จๆๆกฃ็ๆชๅพใๆๅ็ฐๆไฝฟ็จ 72dpi ๅฐฑไธไผๅบ้๏ผ144dpi ๅไปฅไธๅฐฑไผๅบ้ใ
ๆไน้ๅฐไบๅๆ ท็้่ฏฏใ macbook m1 pro
ๆไน้ๅฐไบๅๆ ท็้่ฏฏใ
่ฎพๅค๏ผMac mini M1 ็ๆฌ๏ผ paddlepaddle 3.0.0 paddlex 3.2.0 ไฝฟ็จๆจกๅ๏ผPP-OCRv5_server_rec
I also got this error when ive trying to use thai with MacOS M2 as well btw im use the version PaddleOCR 3.2.0
zsh: bus error env OMP_NUM_THREADS=1 OPENBLAS_NUM_THREADS=1 MKL_NUM_THREADS=1 python -m
/miniconda3/envs/ocr-paddle/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
I believe this issue has been resolved in PR #75731. I recommend installing the latest nightly build of PaddlePaddle and retrying your code. Your feedback would be greatly appreciated.
@scyyh11 Hi, thanks for pointing me to PR #75731. I've tested with the latest versions but I'm still experiencing bus errors on macOS ARM:
Environment:
- PaddlePaddle: 3.2.0 (commit: e22e2f9af7eeced7e3c9582ddb69a617887d3eb9)
- PaddleOCR: 3.2.0
- PaddleX: 3.2.0
- macOS: macOS Sequoia ver 15.6.1
- Chip: Apple Silicon M2(ARM)
- Python: 3.10
- Issue: The application crashes with a bus error when running OCR inference, specifically after model initialization completes. The crash occurs during the first
predict()call.
Environment variables set:
OMP_NUM_THREADS=1
OPENBLAS_NUM_THREADS=1
MKL_NUM_THREADS=1
FLAGS_use_mkldnn=False
KMP_DUPLICATE_LIB_OK=True
PADDLEX_OFFLINE_MODE=1
Output:
zsh: bus error python -m ocr_bench.run_benchmark ...
/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker:
There appear to be 1 leaked semaphore objects to clean up at shutdown
The crash happens consistently when trying to process images with Thai language models (th_PP-OCRv5_mobile_rec). Is there additional configuration needed for macOS ARM, or is this a different issue from what was fixed in PR #75731? Any guidance would be appreciated.
@KwinyarutP Hi, looking at your PaddlePaddle version, I think youโre still using the latest released version. The fix is on the develop branch and hasnโt been released yet. You can run the following command to install the latest nightly build:
pip install --pre paddlepaddle -i https://www.paddlepaddle.org.cn/packages/nightly/cpu/
Thanks for your report. This issue has been fixed in the development branch. Please upgrade to the latest nightly build to get the fix.
First, you can check your current version with:
pip list | grep paddlepaddle
If you are on a stable release (e.g., 3.2.0), please install the latest development version:
pip install --pre paddlepaddle -i https://www.paddlepaddle.org.cn/packages/nightly/cpu/
Please let us know if the issue persists after the upgrade.
The issue has no response for a long time and will be closed. You can reopen or new another issue if are still confused.
From Bot