PaddleOCR icon indicating copy to clipboard operation
PaddleOCR copied to clipboard

No Korean Support in PP-OCRv5

Open dcleedx opened this issue 6 months ago • 4 comments

🔎 Search before asking

  • [x] I have searched the PaddleOCR Docs and found no similar bug report.
  • [x] I have searched the PaddleOCR Issues and found no similar bug report.
  • [x] I have searched the PaddleOCR Discussions and found no similar bug report.

🐛 Bug (问题描述)

Hi PaddleOCR Team,

We at DEEPX have tested the PP-OCRv5 model and are very impressed with its performance — it shows a clear accuracy improvement over our current solution.

However, we noticed that Korean is not supported in this version. As we have significant demand in the Asia-Pacific market, Korean language support is crucial for our use cases.

We sincerely hope the PaddleOCR team can consider adding Korean support to PP-OCRv5 as soon as possible. Thank you for your great work!

Best regards, DEEPX Team

🏃‍♂️ Environment (运行环境)

OS linux environment shell paddleOCR 3.0.0 install pip RAM 125G cpu x86 GPU RTX 3090

🌰 Minimal Reproducible Example (最小可复现问题的Demo)

thanks

dcleedx avatar May 23 '25 09:05 dcleedx

Hello, thank you for your interest in PP-OCRv5. We are very happy to hear that it has greatly improved your specific scenario. As for the Korean language recognition you mentioned, PP-OCRv5 does not currently support it, but we will prioritize your request in the future.

cuicheng01 avatar May 23 '25 12:05 cuicheng01

Just for the context.. it tries to do Korean and fails with Chinese output instead:

PS C:\Users\user\Pictures\Screenshots> paddleocr ocr -i .\7.jpg --lang korean  --ocr_version PP-OCRv5
INFO: Could not find files for the given pattern(s).
C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\paddle\utils\cpp_extension\extension_utils.py:711: UserWarning: No ccache found. Please be aware that recompiling all source files may be required. You can download and install ccache from: https://github.com/ccache/ccache/blob/master/doc/INSTALL.md
  warnings.warn(warning_message)
Creating model: ('PP-LCNet_x1_0_doc_ori', None)
Using official model (PP-LCNet_x1_0_doc_ori), the model files will be automatically downloaded and saved in C:\Users\user\.paddlex\official_models.
Creating model: ('UVDoc', None)
Using official model (UVDoc), the model files will be automatically downloaded and saved in C:\Users\user\.paddlex\official_models.
Creating model: ('PP-LCNet_x0_25_textline_ori', None)
Using official model (PP-LCNet_x0_25_textline_ori), the model files will be automatically downloaded and saved in C:\Users\user\.paddlex\official_models.
Creating model: ('PP-OCRv5_mobile_det', None)
Using official model (PP-OCRv5_mobile_det), the model files will be automatically downloaded and saved in C:\Users\user\.paddlex\official_models.
Creating model: ('PP-OCRv5_mobile_rec', None)
Using official model (PP-OCRv5_mobile_rec), the model files will be automatically downloaded and saved in C:\Users\user\.paddlex\official_models.
[2025/05/27 13:50:13] paddleocr INFO: Processed item 0 in 4665.014028549194 ms
{'res': {'input_path': '.\\7.jpg', 'page_index': None, 'model_settings': {'use_doc_preprocessor': True, 'use_textline_orientation': True}, 'doc_preprocessor_res': {'input_path': None, 'page_index': None, 'model_settings': {'use_doc_orientation_classify': True, 'use_doc_unwarping': True}, 'angle': 0}, 'dt_polys': array([[[106, 108],
        ...,
        [106, 147]],

       ...,

       [[ 93, 594],
        ...,
        [ 92, 630]]], dtype=int16), 'text_det_params': {'limit_side_len': 736, 'limit_type': 'min', 'thresh': 0.3, 'max_side_limit': 4000, 'box_thresh': 0.6, 'unclip_ratio': 1.5}, 'text_type': 'general', 'textline_orientation_angles': array([0, ..., 0]), 'text_rec_score_thresh': 0.0, 'rec_texts': ['外号', '外三全异外', '外三人', '予号号', '外三立', '日卫', '吗', '', '出立', '分', '590710', '{xlo', '外三', '-1914625', '4009-0534-9635-5027', '8.望早外异', '()1旱.', '。', ')', '外三()1早.', '恐是1早.'], 'rec_scores': array([0.57926583, ..., 0.66197598]), 'rec_polys': array([[[106, 108],
        ...,
        [106, 147]],

       ...,

       [[ 93, 594],
        ...,
        [ 92, 630]]], dtype=int16), 'rec_boxes': array([[106, ..., 150],
       ...,
       [ 92, ..., 637]], dtype=int16)}}

I'd rather have it report itself unavailable for Korean until the feature is actually available instead of writing incorrect output, just like what happens with v4:

PS C:\Users\user\Pictures\Screenshots> paddleocr ocr -i .\7.jpg --lang korean  --ocr_version PP-OCRv4
Traceback (most recent call last):
  File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "C:\Users\user\AppData\Local\Programs\Python\Python310\Scripts\paddleocr.exe\__main__.py", line 7, in <module>
  File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\paddleocr\__main__.py", line 26, in console_entry
    main()
  File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\paddleocr\_cli.py", line 118, in main
    _execute(args)
  File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\paddleocr\_cli.py", line 111, in _execute
    args.executor(args)
  File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\paddleocr\_pipelines\ocr.py", line 578, in execute_with_args
    perform_simple_inference(PaddleOCR, params)
  File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\paddleocr\_utils\cli.py", line 53, in perform_simple_inference
    wrapper = wrapper_cls(**params)
  File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\paddleocr\_pipelines\ocr.py", line 95, in __init__
    raise ValueError(
ValueError: No models are available for the language 'korean' and OCR version 'PP-OCRv4'.

Of course it would be best to see Korean feature available for paddleocr soon, but in the meantime I think it is better to report error when --lang korean is requested.

Thanks.

thejjw avatar May 27 '25 05:05 thejjw

I actually made a pull request to address unsupported language problem in v5. Please see if you could integrate: #15428

thejjw avatar May 27 '25 05:05 thejjw

Hi,feel free to submit your request directly in this issue: 👉 PaddleOCR Multilingual Support Request #15617

cuicheng01 avatar Jun 06 '25 11:06 cuicheng01

we will prioritize your request in the future.

Hello, @cuicheng01, could you tell the expected timeline for the Korean language support in PP-OCRv5? I might assist in training the dataset if there is a clearer guideline. Thanks.

bit-scientist avatar Jun 25 '25 02:06 bit-scientist

I will prioritize your request moving forward.

Hello, [[@cuicheng01]], could you tell the expected timeline for the Korean language support in PP-OCRv5? I might assist in training the dataset if there is a clearer guideline. Thanks.

@bit-scientist, I remember that it is mentioned that Korean support for PP-OCRv5 is added in the PaddleX release for 3.1.0 version which was released just yesterday.

maakdan avatar Jun 29 '25 03:06 maakdan

@maakdan thanks for the update i see the release is at PaddleX repository , not here but hope PaddleOCR follows soon!

thejjw avatar Jun 29 '25 05:06 thejjw

https://github.com/PaddlePaddle/PaddleOCR/releases/tag/v3.1.0 v3.1.0 is out with support for Korean 한글 지원한다고 합니다 👏

thejjw avatar Jun 29 '25 09:06 thejjw

Hello everyone, PP-OCRv5 now indeed supports Korean text recognition, with a significant improvement in accuracy compared to the previous generation model. We welcome everyone to use it and provide feedback. details

cuicheng01 avatar Jul 08 '25 04:07 cuicheng01

thanks for the great work, @cuicheng01. Could you check the #15908 please. It's weird that it's not fully integrated yet.

bit-scientist avatar Jul 08 '25 06:07 bit-scientist

Hello @bit-scientist , The issue you’re experiencing may be due to models exported with PaddlePaddle 3.1 being incompatible with inference on PaddlePaddle 3.0. This incompatibility arises because version 3.1 includes additional checks to prevent operator precision errors. To resolve this, we have re-exported the models using PaddlePaddle 3.0 and updated the inference model accordingly. These updates have been thoroughly tested and verified on multiple devices using both PaddlePaddle 3.0 and 3.1.

Please try running the updated model again, and we apologize for any inconvenience this may have caused.

cuicheng01 avatar Jul 09 '25 04:07 cuicheng01

While searching for issues related to the Korean model, I found this and wanted to leave a comment here.

Hello @cuicheng01 ,

Thanks for Korean support in v3.1.0!

But inference tools throw ValueError for multilingual models like korean_PP-OCRv5_mobile_rec, latin_PP-OCRv5_mobile_rec, eslav_PP-OCRv5_mobile_rec.

PR submitted to fix: #16032

Could you check this PR?

Ea3124 avatar Jul 14 '25 09:07 Ea3124

Hi, I think we still have similar issues with @bit-scientist , please check #16055

KimEJ avatar Jul 15 '25 13:07 KimEJ

Hello, this might have been caused by a previous bug, but in the latest version 3.2.0, this issue should no longer exist.

cuicheng01 avatar Sep 02 '25 13:09 cuicheng01

The issue has no response for a long time and will be closed. You can reopen or new another issue if are still confused.


From Bot

TingquanGao avatar Oct 04 '25 03:10 TingquanGao