PaddleOCR icon indicating copy to clipboard operation
PaddleOCR copied to clipboard

`Bus error` when using `korean` language with `paddleocr` on macOS CLI

Open KimEJ opened this issue 5 months ago โ€ข 11 comments

๐Ÿ”Ž Search before asking

  • [x] I have searched the PaddleOCR Docs and found no similar bug report.
  • [x] I have searched the PaddleOCR Issues and found no similar bug report.
  • [x] I have searched the PaddleOCR Discussions and found no similar bug report.

๐Ÿ› Bug (้—ฎ้ข˜ๆ่ฟฐ)

Description:

I am encountering a bus error when attempting to perform OCR with the korean language model using paddleocr on my macOS command-line interface. The fr (French) language model works correctly without any issues.

Steps to Reproduce:

  1. Successful Case (French Language - works as expected):

    paddleocr ocr -i https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_ocr_french01.png \
        --lang fr \
        --use_doc_orientation_classify False \
        --use_doc_unwarping False \
        --use_textline_orientation False \
        --save_path ./output \
        --device gpu:0
    

    Output (Excerpt): (Successful OCR output, including res dictionary with recognized text.)

    /opt/anaconda3/lib/python3.12/site-packages/paddle/utils/cpp_extension/extension_utils.py:715: UserWarning: No ccache found. Please be aware that recompiling all source files may be required. You can download and install ccache from: https://github.com/ccache/ccache/blob/master/doc/INSTALL.md
      warnings.warn(warning_message)
    Creating model: ('PP-OCRv5_server_det', None)
    Using official model (PP-OCRv5_server_det), the model files will be automatically downloaded and saved in /Users/kimuj5090/.paddlex/official_models.
    Fetching 6 files: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 6/6 [00:00<00:00, 2728.59it/s]
    E0715 22:09:09.284237 233791232 analysis_config.cc:169] Please use PaddlePaddle with GPU version.
    Creating model: ('latin_PP-OCRv5_mobile_rec', None)
    Using official model (latin_PP-OCRv5_mobile_rec), the model files will be automatically downloaded and saved in /Users/kimuj5090/.paddlex/official_models.
    Fetching 6 files: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 6/6 [00:00<00:00, 1406.15it/s]
    E0715 22:09:09.653545 233791232 analysis_config.cc:169] Please use PaddlePaddle with GPU version.
    Connecting to https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_ocr_french01.png ...
    Downloading general_ocr_french01.png ...
    [==================================================] 100.00%
    [2025/07/15 22:09:15] paddleocr INFO: Processed item 0 in 6256.869316101074 ms
    {'res': {'input_path': '/Users/kimuj5090/.paddlex/predict_input/general_ocr_french01.png', 'page_index': None, 'model_settings': {'use_doc_preprocessor': True, 'use_textline_orientation': False}, 'doc_preprocessor_res': {'input_path': None, 'page_index': None, 'model_settings': {'use_doc_orientation_classify': False, 'use_doc_unwarping': False}, 'angle': -1}, 'dt_polys': array([[[119,  23], ..., [118,  75]], ..., [[109, 506], ..., [108, 556]]], dtype=int16), 'text_det_params': {'limit_side_len': 64, 'limit_type': 'min', 'thresh': 0.3, 'max_side_limit': 4000, 'box_thresh': 0.6, 'unclip_ratio': 1.5}, 'text_type': 'general', 'textline_orientation_angles': array([-1, ..., -1]), 'text_rec_score_thresh': 0.0, 'rec_texts': ['mifere; la profpรฉritรฉ & les fuccรจs ac-', 'compagnent lโ€™homme induftrieux.', 'Quel eft celui qui a acquis des ri-', 'cheffes, qui eft devenu puiffant, qui', 'sโ€™eft couvert de gloire, dont lโ€™รฉloge', 'retentit par-tout, qui fiege au confeil', "du Roi? C'eft celui qui bannit la pa-", "reffe de fa maifon, & qui a dit ร  l'oifi-", 'vetรฉ : tu es mon ennemie.'], 'rec_scores': array([0.98409891, ..., 0.98091096]), 'rec_polys': array([[[119,  23], ..., [118,  75]], ..., [[109, 506], ..., [108, 556]]], dtype=int16), 'rec_boxes': array([[118, ...,  81], ..., [108, ..., 562]], dtype=int16)}}
    
  2. Failing Case (Korean Language - results in bus error):

    paddleocr ocr -i 01_original.png \
        --lang korean \
        --use_doc_orientation_classify False \
        --use_doc_unwarping False \
        --use_textline_orientation False \
        --save_path ./output \
        --device gpu:0
    

    Output (Excerpt):

    /opt/anaconda3/lib/python3.12/site-packages/paddle/utils/cpp_extension/extension_utils.py:715: UserWarning: No ccache found. Please be aware that recompiling all source files may be required. You can download and install ccache from: https://github.com/ccache/ccache/blob/master/doc/INSTALL.md
      warnings.warn(warning_message)
    Creating model: ('PP-OCRv5_server_det', None)
    Using official model (PP-OCRv5_server_det), the model files will be automatically downloaded and saved in /Users/kimuj5090/.paddlex/official_models.
    Fetching 6 files: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 6/6 [00:00<00:00, 2502.07it/s]
    E0715 22:10:31.702410 233791232 analysis_config.cc:169] Please use PaddlePaddle with GPU version.
    Creating model: ('korean_PP-OCRv5_mobile_rec', None)
    Using official model (korean_PP-OCRv5_mobile_rec), the model files will be automatically downloaded and saved in /Users/kimuj5090/.paddlex/official_models.
    Fetching 6 files: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 6/6 [00:00<00:00, 1404.19it/s]
    E0715 22:10:32.201907 233791232 analysis_config.cc:169] Please use PaddlePaddle with GPU version.
    zsh: bus error  paddleocr ocr -i 01_original.png --lang korean --use_doc_orientation_classify
    /opt/anaconda3/lib/python3.12/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
      warnings.warn('resource_tracker: There appear to be %d '
    

    Note: 01_original.png is a local image file writtend by korean.

Expected And Actual:

When using --lang korean, the command should execute successfully and produce OCR results for the Korean text, similar to how it functions for the French language. But the process terminates with a bus error when --lang korean is specified.

My Thinks:

I think it is possible that there's a memory management issue specific to the Korean language model or its dependencies on macOS, leading to the bus error.

๐Ÿƒโ€โ™‚๏ธ Environment (่ฟ่กŒ็Žฏๅขƒ)

  • Operating System: macOS (cli)
  • PaddlePaddle Version: 3.1.0 (compiled with with_gpu: OFF)
  • PaddleOCR Version: 3.1.0
  • Python Version: 3.12 (from Anaconda)
  • GPU: Not used (as with_gpu: OFF in PaddlePaddle compilation)

๐ŸŒฐ Minimal Reproducible Example (ๆœ€ๅฐๅฏๅค็Žฐ้—ฎ้ข˜็š„Demo)

```bash
paddleocr ocr -i 01_original.png \
    --lang korean \
    --use_doc_orientation_classify False \
    --use_doc_unwarping False \
    --use_textline_orientation False \
    --save_path ./output \
    --device gpu:0
```

Note: 01_original.png is a local image file writtend by korean.

KimEJ avatar Jul 15 '25 13:07 KimEJ

+1

Getting the same error on

paddleocr ocr -i ./general_formula_recognition_001.png --use_doc_orientation_classify False --use_doc_unwarping False --use_textline_orientation False

[1] 76346 bus error paddleocr ocr -i .junk/test/2310.06825v1_page-0006.jpg False False False /Users/hash/.local/share/uv/python/cpython-3.13.3-macos-aarch64-none/lib/python3.13/multiprocessing/resource_tracker.py:301: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown: {'/loky-76346-fxjfm6qz'}

Operating System: macOS sonoma PaddlePaddle Version: 3.1.0 PaddleOCR Version: 3.1.0 Python Version: 3.13

But this example works somehow!

paddleocr ocr -i https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_ocr_french01.png \
    --lang fr \
    --use_doc_orientation_classify False \
    --use_doc_unwarping False \
    --use_textline_orientation False \
    --save_path ./output \
    --device gpu:0

Anandesh-Sharma avatar Jul 16 '25 22:07 Anandesh-Sharma

Further my analysis goes, it works on small images. here : https://drive.google.com/file/d/11ULHdDxbJ2IxGa9WoX3Pby_NbsDrh_lb/view?usp=sharing

@TingquanGao what do you think ?

Anandesh-Sharma avatar Jul 16 '25 22:07 Anandesh-Sharma

Thank you for your feedback! We will investigate the issue as soon as possible. To help us pinpoint the problem, could you please provide the image that triggered the error? Additionally, please let us know whether your Mac is equipped with an Apple Silicon (M series) chip or an Intel CPU. This information will help us reproduce and resolve the issue more efficiently. Thank you!

TingquanGao avatar Aug 21 '25 08:08 TingquanGao

ๆˆ‘ไนŸ้‡ๅˆฐไบ†ๅŒๆ ท็š„้”™่ฏฏใ€‚

่ฎพๅค‡๏ผšMac mini M4 ็‰ˆๆœฌ๏ผšpaddleocr==3.2.0๏ผŒpaddlepaddle==3.1.1 ไฝฟ็”จๆจกๅž‹๏ผšen_PP-OCRv5_mobile_rec

ๆŠฑๆญ‰ๅ›พ็‰‡ไธๆ–นไพฟๅˆ†ไบซ๏ผŒๆ˜ฏๅ†…้ƒจๆ–‡ๆกฃ็š„ๆˆชๅ›พใ€‚ๆˆ‘ๅ‘็Žฐๆˆ‘ไฝฟ็”จ 72dpi ๅฐฑไธไผšๅ‡บ้”™๏ผŒ144dpi ๅŠไปฅไธŠๅฐฑไผšๅ‡บ้”™ใ€‚

alfred-liu96 avatar Aug 22 '25 02:08 alfred-liu96

ๆˆ‘ไนŸ้‡ๅˆฐไบ†ๅŒๆ ท็š„้”™่ฏฏใ€‚ macbook m1 pro

timfengzi avatar Aug 27 '25 03:08 timfengzi

ๆˆ‘ไนŸ้‡ๅˆฐไบ†ๅŒๆ ท็š„้”™่ฏฏใ€‚

่ฎพๅค‡๏ผšMac mini M1 ็‰ˆๆœฌ๏ผš paddlepaddle 3.0.0 paddlex 3.2.0 ไฝฟ็”จๆจกๅž‹๏ผšPP-OCRv5_server_rec

Image

ZYHB avatar Aug 29 '25 09:08 ZYHB

I also got this error when ive trying to use thai with MacOS M2 as well btw im use the version PaddleOCR 3.2.0

zsh: bus error  env OMP_NUM_THREADS=1 OPENBLAS_NUM_THREADS=1 MKL_NUM_THREADS=1     python -m 
/miniconda3/envs/ocr-paddle/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
 warnings.warn('resource_tracker: There appear to be %d '

KwinyarutP avatar Oct 16 '25 08:10 KwinyarutP

I believe this issue has been resolved in PR #75731. I recommend installing the latest nightly build of PaddlePaddle and retrying your code. Your feedback would be greatly appreciated.

scyyh11 avatar Oct 16 '25 11:10 scyyh11

@scyyh11 Hi, thanks for pointing me to PR #75731. I've tested with the latest versions but I'm still experiencing bus errors on macOS ARM:

Environment:

  • PaddlePaddle: 3.2.0 (commit: e22e2f9af7eeced7e3c9582ddb69a617887d3eb9)
  • PaddleOCR: 3.2.0
  • PaddleX: 3.2.0
  • macOS: macOS Sequoia ver 15.6.1
  • Chip: Apple Silicon M2(ARM)
  • Python: 3.10
  • Issue: The application crashes with a bus error when running OCR inference, specifically after model initialization completes. The crash occurs during the first predict() call.

Environment variables set:

OMP_NUM_THREADS=1
OPENBLAS_NUM_THREADS=1
MKL_NUM_THREADS=1
FLAGS_use_mkldnn=False
KMP_DUPLICATE_LIB_OK=True
PADDLEX_OFFLINE_MODE=1

Output:

zsh: bus error  python -m ocr_bench.run_benchmark ...
/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: 
There appear to be 1 leaked semaphore objects to clean up at shutdown

The crash happens consistently when trying to process images with Thai language models (th_PP-OCRv5_mobile_rec). Is there additional configuration needed for macOS ARM, or is this a different issue from what was fixed in PR #75731? Any guidance would be appreciated.

KwinyarutP avatar Oct 17 '25 03:10 KwinyarutP

@KwinyarutP Hi, looking at your PaddlePaddle version, I think youโ€™re still using the latest released version. The fix is on the develop branch and hasnโ€™t been released yet. You can run the following command to install the latest nightly build:

pip install --pre paddlepaddle -i https://www.paddlepaddle.org.cn/packages/nightly/cpu/

scyyh11 avatar Oct 17 '25 03:10 scyyh11

Thanks for your report. This issue has been fixed in the development branch. Please upgrade to the latest nightly build to get the fix.

First, you can check your current version with:

pip list | grep paddlepaddle

If you are on a stable release (e.g., 3.2.0), please install the latest development version:

pip install --pre paddlepaddle -i https://www.paddlepaddle.org.cn/packages/nightly/cpu/

Please let us know if the issue persists after the upgrade.

TingquanGao avatar Oct 18 '25 17:10 TingquanGao

The issue has no response for a long time and will be closed. You can reopen or new another issue if are still confused.


From Bot

TingquanGao avatar Nov 19 '25 03:11 TingquanGao