[Regression] MKLDNN does not work at all anymore in version 3.0.2
🔎 Search before asking
- [x] I have searched the PaddleOCR Docs and found no similar bug report.
- [x] I have searched the PaddleOCR Issues and found no similar bug report.
- [x] I have searched the PaddleOCR Discussions and found no similar bug report.
🐛 Bug (问题描述)
When running the following command:
(venv-3.12) PS C:\Users\GPUVM\Desktop\New folder (13)> paddleocr ocr -i "Path\to\images" --lang ch --use_doc_orientation_classify False --use_doc_unwarping False --use_textline_orientation false --device cpu --enable_mkldnn true
MKLDNN is not enabled, it is falling back to the running mode "paddle". It does also not work when omitting the argument "--enable_mkldnn". This was tested also with the latest release branch of paddlex from today.
The issue is also pretty clear. In the code, the PaddleXPipelineWrapper calls prepare_common_init_args like this:
def _create_paddlex_pipeline(self):
kwargs = prepare_common_init_args(None, self._common_args)
return create_pipeline(config=self._merged_paddlex_config, **kwargs)
This means that here:
def prepare_common_init_args(model_name, common_args):
device = common_args["device"]
if device is None:
device = get_default_device()
device_type, device_ids = parse_device(device)
if device_ids is not None:
device_id = device_ids[0]
else:
device_id = None
init_kwargs = {}
init_kwargs["use_hpip"] = common_args["enable_hpi"]
init_kwargs["hpi_config"] = {
"device_type": device_type,
"device_id": device_id,
}
pp_option = PaddlePredictorOption(
model_name, device_type=device_type, device_id=device_id
)
...
the model_name is always None, as it is never populated in the Pipeline Wrapper. That means that in paddlex get_default_run_mode always returns paddle as the running mode:
def get_default_run_mode(model_name, device_type):
if not model_name:
return "paddle"
if device_type != "cpu":
return "paddle"
if (
ENABLE_MKLDNN_BYDEFAULT
and is_mkldnn_available()
and model_name not in MKLDNN_BLOCKLIST
):
return "mkldnn"
else:
return "paddle"
And because PaddleOCR in prepare_common_init_args never sets the run_mode explicitly here:
elif device_type == "cpu":
enable_mkldnn = common_args["enable_mkldnn"]
if enable_mkldnn:
pp_option.mkldnn_cache_capacity = common_args["mkldnn_cache_capacity"]
else:
pp_option.run_mode = "paddle"
MKLDNN can never be enabled. Adding a line like pp_option.run_mode = "mkldnn" in the if enable_mkldnn statement enables mkldnn again, but I dont think thats the proper solution to this issue.
🏃♂️ Environment (运行环境)
OS: Windows 11 PaddleOCR 3.0.2 PaddlePaddle 3.0.0 (CPU version) 16GB RAM GPU: Nvidia GTX 1660 TI Installed via pip in a venv with Python 3.12
🌰 Minimal Reproducible Example (最小可复现问题的Demo)
Explained above.
I've took a deeper look at this. The issue is quite tricky: As mentioned earlier the first entry for the pp_option mode is set to "paddle" because the model_name is None. Then in the BasePipeline class in paddlex (base.py) the model_name is set at a later stage:
if self.pp_option is not None:
pp_option = self.pp_option.copy()
pp_option.model_name = config["model_name"]
pp_option.run_mode = self.pp_option.run_mode
But the run mode is already set to "paddle", so this assignment doesn't change anything. I first thought the easy solution would be to replace the last line from above with this:
pp_option.reset_run_mode_by_default(model_name=config["model_name"])
This updates the run_mode a second time after the model_name is set. This works - mkldnn is enabled again. But this also means that mkldnn can not be disabled anymore, because get_default_run_mode will now always return mkldnn as this function has no parameter to notify it of explicit disablement of MKLDNN. I think this replacement is right in theory, but PaddleX needs an option that can be set to explicitly disable mkldnn because get_default_run_mode() is not taking this into account right now I think.
Good morning.
Why do you think that " Adding a line like pp_option.run_mode = "mkldnn" in the if enable_mkldnn statement" it is not a good solution? What could go wrong?
@caa24 It definitely works. One of the maintainers @Bobholamovic made also exactly a PR with this change this morning to fix another issue: #15790. So this would also fix this issue. I thought this was not the right way of handling this, because what I thought from other conversations was, that PaddleX, which does most of the work in the backend, should enable mkldnn by default, without setting it explicitly via this option. And this would in theory also work, if the model when initializing the predictor would not be None... @Bobholamovic What do you think? If this is fine, this issue can also be closed via #15790.
@caa24 It definitely works. One of the maintainers @Bobholamovic made also exactly a PR with this change this morning to fix another issue: #15790. So this would also fix this issue. I thought this was not the right way of handling this, because what I thought from other conversations was, that PaddleX, which does most of the work in the backend, should enable mkldnn by default, without setting it explicitly via this option. And this would in theory also work, if the model when initializing the predictor would not be None... @Bobholamovic What do you think? If this is fine, this issue can also be closed via #15790.
I would say that the approach I implemented in #15790 works, but it's not an optimal solution.
Currently, the logic for determining the default backend in PaddleX is relatively complex. Roughly speaking, if the device is not set (or set to None) and pp_option is not provided, the backend is automatically determined based on a "use MKLDNN if possible" rule. This is the default behavior of PaddleX, as I explained in #15656.
However, the initialization of Predictor is not that straightforward. For example, if device is not set but pp_option is provided, then the default logic will be overridden. In this case, pp_option.run_mode remains unchanged (which is paddle) — and that’s exactly the pitfall I encountered in PaddleOCR 3.0.2, which led to the issue where "MKLDNN does not work at all."
By explicitly setting pp_option.run_mode = 'mkldnn', the program behaves as expected on devices that support MKL-DNN. However, on devices that don't support it, users will see a warning like "MKL-DNN is not available, falling back to CPU." That’s why I say this is a suboptimal solution. The warning can be suppressed by also setting enable_mkldnn = True. That said, a better approach would have been to let PaddleX handle the default run_mode internally, which would avoid this kind of warning. However, I haven’t fully sorted out the logic in this part yet, so for now I went with this workaround.
Also, testing in this area is currently lacking, and we’re actively working on improving it. Once the test coverage is more complete, we’ll be able to ensure that PaddleOCR works as expected before releasing new versions. Sorry for introducing this bug, and thanks again for reporting the bug!