PaddleOCR icon indicating copy to clipboard operation
PaddleOCR copied to clipboard

No input validation for specified device

Open timminator opened this issue 6 months ago • 4 comments

🔎 Search before asking

  • [x] I have searched the PaddleOCR Docs and found no similar bug report.
  • [x] I have searched the PaddleOCR Issues and found no similar bug report.
  • [x] I have searched the PaddleOCR Discussions and found no similar bug report.

🐛 Bug (问题描述)

There is currently no input validation for the specified device. When for example running from the commandline:

paddleocr ocr --i "D:\Path\to\image" --lang ch  --use_doc_orientation_classify false --use_doc_unwarping false --use_textline_orientation false --device gpu

the device is just assumed to be correct in _common_args.py():

def prepare_common_init_args(model_name, common_args):
    device = common_args["device"]
    if device is None:
        device = get_default_device()
    device_type, _ = parse_device(device)

    init_kwargs = {"device": device}

This results for example in the following problem: You have a system without a supported gpu and you specify "device gpu" mkldnn will not be enabled, even though the inference will fall back to the CPU. I came up with the following solution, which I could make a PR for but I could not come with a good solution on how to verify the availabilty for xpu, npu etc. This is another issue - when specifying xpu you are just seeing a runtim error instead of a graceful switch to the CPU.

My current solution:

def prepare_common_init_args(model_name, common_args):
    requested_device = common_args["device"]
    requested_device_type = None

    if requested_device is not None:
        requested_device_type, _ = parse_device(requested_device)

    supported_device = get_default_device()
    supported_device_type, _ = parse_device(supported_device)

    # Fallback if the requested device is not supported
    if requested_device_type == "gpu" and supported_device_type != "gpu":
        device = supported_device
        device_type = supported_device_type
    elif requested_device_type is None:
        device = supported_device
        device_type = supported_device_type
    else:
        device = requested_device
        device_type = requested_device_type

    init_kwargs = {"device": device}

🏃‍♂️ Environment (运行环境)

OS: Windows 11 PaddleOCR 3.0.1 PaddlePaddle 3.0.0 (CPU version) 16GB RAM GPU: Nvidia GTX 1660 TI Installed via pip in a venv with Python 3.11

🌰 Minimal Reproducible Example (最小可复现问题的Demo)

Explained above

timminator avatar Jun 09 '25 12:06 timminator

We plan to enable the MKLDNN option by default in the future, which may affect the default behavior when specifying devices. Stay tuned.

zhang-prog avatar Jun 11 '25 12:06 zhang-prog

We plan to enable the MKLDNN option by default in the future,

Sorry but this is not true it is already be default enabled since 3.0.1 . Is also mentioned in the changelog. Not checking the validation of the specified device prevents PaddleOCR from enabling mkldnn in this case, so we definitely need to add a few checks here. @Bobholamovic I know tagging people is not too well liked, but could you take a look at this?

timminator avatar Jun 12 '25 21:06 timminator

PaddleOCR currently assumes that the device provided by the user is available, and it will attempt to use that device to build the underlying paddle.inference.Predictor. When an incorrect device is provided, the behavior of the program is undefined—it might fail directly, fall back to CPU, etc. If we can add some checks to reject unavailable devices or provide hints for fallback behaviors, that would definitely be more user-friendly. I think that’s a good idea. However, I’ve given some thought to how this should be implemented, and here are my suggestions:

  1. PaddleOCR 3.0 is positioned more as a wrapper around PaddleX’s OCR-related capabilities. I suggest placing the device validation logic in PaddleX rather than PaddleOCR, because this feature shouldn’t be exclusive to PaddleOCR. The benefit is that PaddleOCR can remain lightweight, while PaddleX takes care of the more complex underlying logic.

  2. Regarding the request to "fall back to CPU and use MKL-DNN when GPU is unavailable": In the next version of PaddleX, we plan to change the default run_mode on CPU to mkldnn. This means if the provided device is CPU, MKL-DNN acceleration will be enabled by default. (Currently, PaddleOCR and PaddleX have different default behaviors, but this will be unified in the next version.) So, for this specific requirement, I recommend using the GPUtil library (or any alternatives) to check for GPU availability here, implement the fallback logic, and provide a user-friendly message. When PaddlePredictorOption.device_type is set to cpu, the subsequent logic will automatically enable MKL-DNN. Additionally, I don’t recommend using get_default_device to check for GPU availability—it’s intended to get the default device, not to verify GPU availability. Using it this way relies on implementation-specific side effects, which is probably not good practice. If necessary, consider extracting this logic into an is_gpu_available function for reuse.

  3. Full validation may not be easy to implement, especially when multiple hardware types are involved (e.g., XPU, MLU). We’d need to consider: whether the physical device is available, whether the framework supports it (not just Paddle—in high-performance inference scenarios, other inference frameworks may be involved), and different devices may require very different handling. There are also edge cases—for instance, a user might install a CPU-only version of the framework in a GPU-enabled environment and attempt GPU inference using the high-performance inference plugin, or install a GPU version of the framework in a GPU-less environment and try to run XPU inference. It’s hard to predict user behavior. While doing proper validation is a good idea, I haven’t had the time to think through a clean way to handle all such situations. One initial idea is to use this API at least for Paddle, but I’m not sure how reliable it is—we’ll likely need more testing.

Thanks for the suggestion! As mentioned above, I feel that implementing full validation might not be a trivial task, but addressing specific cases (like GPU fallback to CPU) is relatively straightforward. If you're interested, contributions are very welcome.

Bobholamovic avatar Jun 13 '25 03:06 Bobholamovic

Thanks for giving such a detailed draft on how to implement this.

Regarding the request to "fall back to CPU and use MKL-DNN when GPU is unavailable": In the next version of PaddleX, we plan to change the default run_mode on CPU to mkldnn. This means if the provided device is CPU, MKL-DNN acceleration will be enabled by default. (Currently, PaddleOCR and PaddleX have different default behaviors, but this will be unified in the next version.) So, for this specific requirement, I recommend using the GPUtil library (or any alternatives) to check for GPU availability here, implement the fallback logic, and provide a user-friendly message. When PaddlePredictorOption.device_type is set to cpu, the subsequent logic will automatically enable MKL-DNN. Additionally, I don’t recommend using get_default_device to check for GPU availability—it’s intended to get the default device, not to verify GPU availability. Using it this way relies on implementation-specific side effects, which is probably not good practice. If necessary, consider extracting this logic into an is_gpu_available function for reuse.

I tried to work on this, but noticed two things. Issue #15793 is currently preventing me from successfully implementing this. When adding such a device check in the device_type setter I can successfully switch back to the CPU, but I cannot enable MKLDNN because the model_name is None at that stage, so get_default_run_mode returns paddle not mkldnn. This needs to be resolved first, then I can make a PR.

The second thing is this:

This means if the provided device is CPU, MKL-DNN acceleration will be enabled by default.

Doesnt this mean, that your PR #15790 is not the right solution to #15738 and #15782? Because I thought from this statement explicitly setting mkldnn should not be done anymore with this approach... You can see more about this in my comments in #15782.

timminator avatar Jun 20 '25 11:06 timminator

Why was this closed? Is this fixed in the new PaddleOCR 3.2 release?

timminator avatar Aug 22 '25 07:08 timminator

Tested it on the new PaddleOCR 3.2 release. This is NOT fixed! The issue still persists. Please reopen this issue. Issues cannot be closed just because 2 months passed...

timminator avatar Aug 22 '25 15:08 timminator

I made a PR over at the PaddleX repo now to fix this issue. The other issue #15793 that prevented me from doing this 2 months ago was fixed in the meantime. @Bobholamovic I would appreciate it if you could take a look at it. I had to do it a bit differently then you suggested. I could not add the gpu compatibility check to the device type setter because the model_name is unknown there. Instead I added it to the setdefault_by_model_name() function as here the model name is known. I also used paddle to check for the GPU compatibility because via GPUtil we can determine if a GPU is installed but we cannot determine if the right Paddle version with CUDA support is installed - thats why I did it like this. Im interested in your feedback, or if everything is fine I would appreciate an approval.

timminator avatar Aug 24 '25 22:08 timminator