Error information show in Gradio WebUI
🔎 Search before asking | 提交之前请先搜索
- [x] I have searched the MinerU Readme and found no similar bug report.
- [x] I have searched the MinerU Issues and found no similar bug report.
- [x] I have searched the MinerU Discussions and found no similar bug report.
🤖 Consult the online AI assistant for assistance | 在线 AI 助手咨询
- [x] I have consulted the online AI assistant but was unable to obtain a solution to the issue.
Description of the bug | 错误描述
Below information show:
Containers mineru-gradio
mineru-gradio b8683f588cc5 mineru-vllm:latest 7860:7860 STATUS Running (18 minutes ago)
Start init vLLM engine...
INFO 09-19 05:35:10 [init.py:241] Automatically detected platform cuda.
INFO 09-19 05:35:15 [init.py:711] Resolved architecture: Qwen2VLForConditionalGeneration
INFO 09-19 05:35:15 [init.py:1750] Using max model len 16384
INFO 09-19 05:35:15 [scheduler.py:222] Chunked prefill is enabled with max_num_batched_tokens=5120.
(EngineCore_0 pid=156) INFO 09-19 05:35:16 [core.py:636] Waiting for init message from front-end.
(EngineCore_0 pid=156) INFO 09-19 05:35:16 [core.py:74] Initializing a V1 LLM engine (v0.10.1.1) with config: model='/root/.cache/modelscope/hub/models/OpenDataLab/MinerU2___5-2509-1___2B', speculative_config=None, tokenizer='/root/.cache/modelscope/hub/models/OpenDataLab/MinerU2___5-2509-1___2B', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config={}, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=16384, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=/root/.cache/modelscope/hub/models/OpenDataLab/MinerU2___5-2509-1___2B, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=True, pooler_config=None, compilation_config={"level":3,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output","vllm.mamba_mixer2"],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"cudagraph_mode":1,"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"pass_config":{},"max_capture_size":256,"local_cache_dir":null}
(EngineCore_0 pid=156) INFO 09-19 05:35:18 [parallel_state.py:1134] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0
(EngineCore_0 pid=156) WARNING 09-19 05:35:18 [interface.py:389] Using 'pin_memory=False' as WSL is detected. This may slow down the performance.
(EngineCore_0 pid=156) INFO 09-19 05:35:18 [topk_topp_sampler.py:50] Using FlashInfer for top-p & top-k sampling.
(EngineCore_0 pid=156) INFO 09-19 05:35:19 [gpu_model_runner.py:1953] Starting to load model /root/.cache/modelscope/hub/models/OpenDataLab/MinerU2___5-2509-1___2B...
(EngineCore_0 pid=156) INFO 09-19 05:35:19 [gpu_model_runner.py:1985] Loading model from scratch...
(EngineCore_0 pid=156) WARNING 09-19 05:35:19 [cuda.py:211] Current vllm-flash-attn has a bug inside vision module, so we use xformers backend instead. You can run pip install flash-attn to use flash-attention backend.
(EngineCore_0 pid=156) INFO 09-19 05:35:19 [cuda.py:328] Using Flash Attention backend on V1 engine.
(EngineCore_0 pid=156)
(EngineCore_0 pid=156)
(EngineCore_0 pid=156)
(EngineCore_0 pid=156)
(EngineCore_0 pid=156) INFO 09-19 05:35:19 [default_loader.py:262] Loading weights took 0.27 seconds
(EngineCore_0 pid=156) INFO 09-19 05:35:20 [gpu_model_runner.py:2007] Model loading took 2.1637 GiB and 0.416150 seconds
(EngineCore_0 pid=156) INFO 09-19 05:35:20 [gpu_model_runner.py:2591] Encoder cache will be initialized with a budget of 14175 tokens, and profiled with 1 video items of the maximum feature size.
CUDA error (/__w/xformers/xformers/third_party/flash-attention/hopper/flash_fwd_launch_template.h:188): invalid argument
2025-09-19 05:35:21.544 | ERROR | mineru.cli.gradio_app:main:277 - Engine core initialization failed. See root cause above. Failed core proc(s): {}
Traceback (most recent call last):
File "/usr/local/bin/mineru-gradio", line 7, in
sys.exit(main())
│ │ └ <Command main>
│ └ <built-in function exit>
└ <module 'sys' (built-in)>
File "/usr/local/lib/python3.12/dist-packages/click/core.py", line 1442, in call
return self.main(*args, **kwargs)
│ │ │ └ {}
│ │ └ ()
│ └ <function Command.main at 0x71f19d255760>
└ <Command main>
File "/usr/local/lib/python3.12/dist-packages/click/core.py", line 1363, in main
rv = self.invoke(ctx)
│ │ └ <click.core.Context object at 0x71f19d5745f0>
│ └ <function Command.invoke at 0x71f19d255440>
└ <Command main>
File "/usr/local/lib/python3.12/dist-packages/click/core.py", line 1226, in invoke
return ctx.invoke(self.callback, **ctx.params)
│ │ │ │ │ └ {'server_name': '0.0.0.0', 'server_port': 7860, 'vllm_engine_enable': True, 'example_enable': True, 'api_enable': True, 'max_...
│ │ │ │ └ <click.core.Context object at 0x71f19d5745f0>
│ │ │ └ <function main at 0x71f0a4fc20c0>
│ │ └ <Command main>
│ └ <function Context.invoke at 0x71f19d254680>
└ <click.core.Context object at 0x71f19d5745f0>
File "/usr/local/lib/python3.12/dist-packages/click/core.py", line 794, in invoke
return callback(*args, **kwargs)
│ │ └ {'server_name': '0.0.0.0', 'server_port': 7860, 'vllm_engine_enable': True, 'example_enable': True, 'api_enable': True, 'max_...
│ └ ()
└ <function main at 0x71f0a4fc20c0>
File "/usr/local/lib/python3.12/dist-packages/click/decorators.py", line 34, in new_func
return f(get_current_context(), *args, **kwargs)
│ │ │ └ {'server_name': '0.0.0.0', 'server_port': 7860, 'vllm_engine_enable': True, 'example_enable': True, 'api_enable': True, 'max_...
│ │ └ ()
│ └ <function get_current_context at 0x71f19d1ff1a0>
How to reproduce the bug | 如何复现
docker compose -f compose.yaml --profile gradio up -d 所有配置都是默认配置,启动Gradio后,上传PDF,无论文档大小,都出现该错误,无法运行。
Operating System Mode | 操作系统类型
Windows
Operating System Version| 操作系统版本
WIN11+Docker+MinerU2.5
Python version | Python 版本
3.13
Software version | 软件版本 (mineru --version)
2.0.x
Device mode | 设备模式
cuda
I find that if select "pipeline" in the column of backend ,OCR is fine,if select "vlm-vllm-async-engine", the error information would be shown.
The reason OCR works with the "pipeline" backend but fails with "vlm-vllm-async-engine" (showing CUDA errors) is due to major differences in how these backends handle model loading and GPU usage.
The "pipeline" backend uses multiple specialized models (like PaddleOCR for OCR, RapidTable for tables) and is robust across different environments, supporting explicit device selection (CPU, CUDA, etc.) and handling errors gracefully. This makes it less sensitive to GPU driver/library mismatches or advanced CUDA features.
The "vlm-vllm-async-engine" backend, on the other hand, loads a single large multimodal model using the vllm library, which aggressively uses CUDA and advanced GPU features like FlashAttention or xformers. If your environment (Windows 11 + Docker + CUDA) has any incompatibility—such as mismatched CUDA drivers, missing GPU exposure in Docker, or bugs in FlashAttention/xformers—you'll see errors like "CUDA error: invalid argument" during model initialization. This backend does not support fallback to CPU or less advanced GPU ops, so it fails outright if the GPU setup isn't perfect for vllm's requirements. Recent code changes (in MinerU 2.5.x) improved compatibility with vllm >=0.10.1, but these are not present in your 2.0.x version yet, so the issue persists in your setup [source].
To troubleshoot: make sure Docker is started with --gpus all and that CUDA_VISIBLE_DEVICES is set correctly. If you are using a single GPU, try limiting visible devices to just that one. If the problem continues, consider upgrading to the latest MinerU version, which includes fixes for vllm backend initialization and better error handling [source]. If you need maximum compatibility and stability, stick with the "pipeline" backend, especially on Windows/Docker environments [source].
To reply, just mention @dosu.
How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other
I deploy the latest MinerU2.5.GPU is 5090D.,cuda version is 12.9.
The reason OCR works with the "pipeline" backend but fails with "vlm-vllm-async-engine" (showing CUDA errors) is due to major differences in how these backends handle model loading and GPU usage.
The "pipeline" backend uses multiple specialized models (like PaddleOCR for OCR, RapidTable for tables) and is robust across different environments, supporting explicit device selection (CPU, CUDA, etc.) and handling errors gracefully. This makes it less sensitive to GPU driver/library mismatches or advanced CUDA features.
The "vlm-vllm-async-engine" backend, on the other hand, loads a single large multimodal model using the vllm library, which aggressively uses CUDA and advanced GPU features like FlashAttention or xformers. If your environment (Windows 11 + Docker + CUDA) has any incompatibility—such as mismatched CUDA drivers, missing GPU exposure in Docker, or bugs in FlashAttention/xformers—you'll see errors like "CUDA error: invalid argument" during model initialization. This backend does not support fallback to CPU or less advanced GPU ops, so it fails outright if the GPU setup isn't perfect for vllm's requirements. Recent code changes (in MinerU 2.5.x) improved compatibility with vllm >=0.10.1, but these are not present in your 2.0.x version yet, so the issue persists in your setup [source].
To troubleshoot: make sure Docker is started with
--gpus alland thatCUDA_VISIBLE_DEVICESis set correctly. If you are using a single GPU, try limiting visible devices to just that one. If the problem continues, consider upgrading to the latest MinerU version, which includes fixes for vllm backend initialization and better error handling [source]. If you need maximum compatibility and stability, stick with the "pipeline" backend, especially on Windows/Docker environments [source].To reply, just mention @dosu.
How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other
![]()
![]()
🔎 先搜索再问 |提交之前请先搜索
🤖 请咨询在线AI助手以获得帮助 |在线 AI 助手咨询
- [x] 我咨询过在线AI助手,但未能找到解决方案。
漏洞描述 |错误描述
以下信息显示:
集装箱矿山-电梯
Mineru-Gradio B8683F588cc5 Mineru-VLLM:最新7860:7860状态运行中(18分钟前)
启动 vLLM 引擎......
INFO 09-19 05:35:10 [初始化 init.py:241] 自动检测到平台CUDA。
INFO 09-19 05:35:15 [init.py:711] 已解决架构:Qwen2VLForConditionalGeneration
INFO 09-19 05:35:15 [init.py:1750] 使用最大模型 len 16384
INFO 09-19 05:35:15 [scheduler.py:222] 分块预填充已启用,且max_num_batched_tokens=5120。
(EngineCore_0 pid=156)信息 09-19 05:35:16 [core.py:636] 正在等待前端的初始化消息。
(EngineCore_0 pid=156)INFO 09-19 05:35:16 [core.py:74] 初始化一个V1 LLM引擎(v0.10.1.1),配置如下:model='/root/.cache/modelscope/hub/models/OpenDataLab/MinerU2___5-2509-1___2B',speculative_config=None,tokenizer='/root/.cache/modelscope/hub/models/OpenDataLab/MinerU2___5-2509-1___2B',skip_tokenizer_init=False,tokenizer_mode=auto,revision=None,override_neuron_config={},tokenizer_revision=None,trust_remote_code=False,dtype=torch.bfloat16, max_seq_len=16384, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=无,collect_detailed_traces=无),seed=0,served_model_name=/root/.cache/modelscope/hub/models/OpenDataLab/MinerU2___5-2509-1___2B,enable_prefix_caching=True,chunked_prefill_enabled=True,use_async_output_proc=True,pooler_config=None,compilation_config={“level”:3,“debug_dump_path”:“”,“cache_dir”:“,”backend“:”“,”custom_ops“:[],”splitting_ops“:[”vllm.unified_attention“,”vllm.unified_attention_with_output“,“vllm.mamba_mixer2”],“use_inductor”:true,“compile_sizes”:[],“inductor_compile_config”:{“enable_auto_functionalized_v2”:false},“inductor_passes”:{},“cudagraph_mode”:1,“use_cudagraph”:true,“cudagraph_num_of_warmups”:1,“cudagraph_capture_sizes”:[256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,48,40,32,24,16,8,4,2,1],“cudagraph_copy_inputs”:false,“full_cuda_graph”:false,“pass_config”:{},“max_capture_size“:256,”local_cache_dir“:null}
(EngineCore_0 pid=156)INFO 09-19 05:35:18 [parallel_state.py:1134] 世界尺码1的0级被分配为DP等级0,PP等级0,TP等级0,EP等级0
(EngineCore_0 pid=156)警告 09-19 05:35:18 [interface.py:389] 检测到WSL时使用“pin_memory=False”。这可能会拖慢性能。
(EngineCore_0 pid=156)INFO 09-19 05:35:18 [topk_topp_sampler.py:50] 使用 FlashInfer 进行顶层和顶层采样。
(EngineCore_0 pid=156)INFO 09-19 05:35:19 [gpu_model_runner.py:1953] 开始加载模型 /root/.cache/modelscope/hub/models/OpenDataLab/MinerU2___5-2509-1___2B...
(EngineCore_0 pid=156)INFO 09-19 05:35:19 [gpu_model_runner.py:1985] 从零加载模型......
(EngineCore_0 pid=156)警告 09-19 05:35:19 [cuda.py:211] Current 的视觉模块内部有个漏洞,所以我们改用 xformer 的后端。你可以跑去用闪光关注后端。
vllm-flash-attn``pip install flash-attn(EngineCore_0 pid=156)信息 09-19 05:35:19 [cuda.py:328] 使用V1发动机的Flash Attention后端。
(EngineCore_0 pid=156)
(EngineCore_0 pid=156)
(EngineCore_0 pid=156)
(EngineCore_0 pid=156)
(EngineCore_0 pid=156)信息 09-19 05:35:19 [ default_loader.py:262] 装载重量耗时0.27秒
(EngineCore_0 pid=156)INFO 09-19 05:35:20 [gpu_model_runner.py:2007] 模型加载耗时2.1637 GiB,耗时0.416150秒
(EngineCore_0 pid=156)INFO 09-19 05:35:20 [gpu_model_runner.py:2591] 编码器缓存将以14175个令牌预算初始化,并用1个最大功能大小的视频项目进行配置。
CUDA 错误(/__w/xformers/xformers/third_party/flash-attention/hopper/flash_fwd_launch_template.h:188):无效参数
2025-09-19 05:35:21.544 |错误 |mineru.cli.gradio_app:main:277 - 引擎核心初始化失败。见上文根本原因。失败的核心触发器:{}
追溯(最近一次通话):
文件“/usr/local/bin/mineru-gradio”,第7行,在
sys.exit(main()) │ │ └ <Command main> │ └ <built-in function exit> └ <module 'sys' (built-in)>文件“/usr/local/lib/python3.12/dist-packages/click/core.py”,第1442行,调用中
return self.main(*args, **kwargs) │ │ │ └ {} │ │ └ () │ └ <function Command.main at 0x71f19d255760> └ <Command main>文件“/usr/local/lib/python3.12/dist-packages/click/core.py”,第1363行,主文件
rv = self.invoke(ctx) │ │ └ <click.core.Context object at 0x71f19d5745f0> │ └ <function Command.invoke at 0x71f19d255440> └ <Command main>文件 “/usr/local/lib/python3.12/dist-packages/click/core.py”,第1226行,invoke
return ctx.invoke(self.callback, **ctx.params) │ │ │ │ │ └ {'server_name': '0.0.0.0', 'server_port': 7860, 'vllm_engine_enable': True, 'example_enable': True, 'api_enable': True, 'max_... │ │ │ │ └ <click.core.Context object at 0x71f19d5745f0> │ │ │ └ <function main at 0x71f0a4fc20c0> │ │ └ <Command main> │ └ <function Context.invoke at 0x71f19d254680> └ <click.core.Context object at 0x71f19d5745f0>文件“/usr/local/lib/python3.12/dist-packages/click/core.py”,第794行,invoke
return callback(*args, **kwargs) │ │ └ {'server_name': '0.0.0.0', 'server_port': 7860, 'vllm_engine_enable': True, 'example_enable': True, 'api_enable': True, 'max_... │ └ () └ <function main at 0x71f0a4fc20c0>文件“/usr/local/lib/python3.12/dist-packages/click/decorators.py”,第34行,new_func
return f(get_current_context(), *args, **kwargs) │ │ │ └ {'server_name': '0.0.0.0', 'server_port': 7860, 'vllm_engine_enable': True, 'example_enable': True, 'api_enable': True, 'max_... │ │ └ () │ └ <function get_current_context at 0x71f19d1ff1a0>如何重现该漏洞 |如何复现
docker compose -f compose.yaml --profile gradio up -d 所有配置都是默认配置,启动Gradio后,上传PDF,无论文档大小,都出现该错误,无法运行。
作系统模式 |操作系统类型
窗户
作系统版本|操作系统版本
WIN11+Docker+MinerU2.5
Python 版本 |Python 版本
3.13
软件版本 |软件版本 (mineru --version)
2.0.x
设备模式 |设备模式
库达
请问解决了吗,我也遇到了相同的问题
https://github.com/opendatalab/MinerU/issues/4136