ragflow [Bug]: If some chunks failed due to GPU OOM, the executor won't retry the failed chunk and entire task will fail.

Is there an existing issue for the same bug?

[x] I have checked the existing issues.

RAGFlow workspace code commit ID

448fa1c

RAGFlow image version

v0.16.0

Other environment information

GPU: A10 (24G)
OS: Ubuntu 22.04
nvidia: NVIDIA-SMI 560.35.03              Driver Version: 560.35.03      CUDA Version: 12.6

Actual behavior

patx_stack_rag-executor.1.tgcik8tmwx7j@iZbp17ecmtdol4tehvhlo7Z patx_stack_rag-executor.1.tgcik8tmwx7j@iZbp17ecmtdol4tehvhlo7Z patx_stack_rag-executor.1.tgcik8tmwx7j@iZbp17ecmtdol4tehvhlo7Z patx_stack_rag-executor.1.tgcik8tmwx7j@iZbp17ecmtdol4tehvhlo7Z patx_stack_rag-executor.1.tgcik8tmwx7j@iZbp17ecmtdol4tehvhlo7Z patx_stack_rag-executor.1.tgcik8tmwx7j@iZbp17ecmtdol4tehvhlo7Z patx_stack_rag-executor.1.tgcik8tmwx7j@iZbp17ecmtdol4tehvhlo7Z patx_stack_rag-executor.1.tgcik8tmwx7j@iZbp17ecmtdol4tehvhlo7Z patx_stack_rag-executor.1.tgcik8tmwx7j@iZbp17ecmtdol4tehvhlo7Z patx_stack_rag-executor.1.tgcik8tmwx7j@iZbp17ecmtdol4tehvhlo7Z | patx_stack_rag-executor.1.tgcik8tmwx7j@iZbp17ecmtdol4tehvhlo7Z patx_stack_rag-executor.1.tgcik8tmwx7j@iZbp17ecmtdol4tehvhlo7Z | patx_stack_rag-executor.1.tgcik8tmwx7j@iZbp17ecmtdol4tehvhlo7Z patx_stack_rag-executor.1.tgcik8tmwx7j@iZbp17ecmtdol4tehvhlo7Z | patx_stack_rag-executor.1.tgcik8tmwx7j@iZbp17ecmtdol4tehvhlo7Z patx_stack_rag-executor.1.tgcik8tmwx7j@iZbp17ecmtdol4tehvhlo7Z | patx_stack_rag-executor.1.tgcik8tmwx7j@iZbp17ecmtdol4tehvhlo7Z patx_stack_rag-executor.1.tgcik8tmwx7j@iZbp17ecmtdol4tehvhlo7Z | patx_stack_rag-executor.1.tgcik8tmwx7j@iZbp17ecmtdol4tehvhlo7Z patx_stack_rag-executor.1.tgcik8tmwx7j@iZbp17ecmtdol4tehvhlo7Z | patx_stack_rag-executor.1.tgcik8tmwx7j@iZbp17ecmtdol4tehvhlo7Z patx_stack_rag-executor.1.tgcik8tmwx7j@iZbp17ecmtdol4tehvhlo7Z | patx_stack_rag-executor.1.tgcik8tmwx7j@iZbp17ecmtdol4tehvhlo7Z | 2025-02-17 20:46:19.147064849 [E:onnxruntime:, sequential_executor.cc:516 ExecuteKernel] Non-zero status code returned while running Conv node. Name:'/model.19/cv2/conv/Conv' Status Message: /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:123 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, common::Status> = void] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:116 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, common::Status> = void] CUDA failure 2: out of memory ; GPU=0 ; hostname=8a8fe068e7eb ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_allocator.cc ; line=47 ; expr=cudaMalloc((void**)&p, size); | | | 2025-02-17 20:46:19,153 INFO 10 set_progress(a9b49952ed2c11efbe6b02420a000082), progress: -1, progress_msg: 20:46:19 Page(121~133): [ERROR]Internal server error while chunking: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running Conv node. Name:/model.19/cv2/conv/Conv Status Message: /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:123 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, common::Status> = void] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:116 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, common::Status> = void] CUDA failure 2: out of memory ; GPU=0 ; hostname=8a8fe068e7eb ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_allocator.cc ; line=47 ; expr=cudaMalloc((void**)&p, size); | | | 2025-02-17 20:46:19,164 ERROR 10 Chunking 专利审查指南2023（官网发布版）.pdf/专利审查指南2023（官网发布版）.pdf got exception | Traceback (most recent call last): | File "/ragflow/rag/svr/task_executor.py", line 218, in build_chunks cks = chunker.chunk(task["name"], binary=binary, from_page=task["from_page"], | File "/ragflow/rag/app/laws.py", line 168, in chunk for txt, poss in pdf_parser(filename if not binary else binary, | File "/ragflow/rag/app/laws.py", line 131, in call self._layouts_rec(zoomin) | File "/ragflow/deepdoc/parser/pdf_parser.py", line 327, in _layouts_rec self.boxes, self.page_layout = self.layouter( | File "/ragflow/deepdoc/vision/layout_recognizer.py", line 70, in call layouts = super().call(image_list, thr, batch_size) | File "/ragflow/deepdoc/vision/recognizer.py", line 483, in call bb = self.postprocess(self.ort_sess.run(None, {k:v for k,v in ins.items() if k in self.input_names}, self.run_options)[0], ins, thr) | File "/ragflow/.venv/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 220, in run return self._sess.run(output_names, input_feed, run_options) | onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running Conv node. Name:'/model.19/cv2/conv/Conv' Status Message: /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:123 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, common::Status> = void] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:116 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, common::Status> = void] CUDA failure 2: out of memory ; GPU=0 ; hostname=8a8fe068e7eb ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_allocator.cc ; line=47 ; expr=cudaMalloc((void**)&p, size);

Expected behavior

No response

Steps to reproduce

I set WS=2 so two tasks are running parallel. In peak conditions, the total amount of GPU memory used by the two tasks exceeds 24GB, but each individual task usually uses less than 5GB. Therefore, if Ragflow can implement an OOM (Out of Memory) retry mechanism, the tasks should run successfully.

Additional information

No response

Feb 17 '25 13:02 ChanningZhang

onnxruntime does not support GPU well.

Feb 18 '25 02:02 KevinHuSh

onnxruntime does not support GPU well.

can ragflow add some kind of retry mechanism on chunk level?

Feb 18 '25 04:02 ChanningZhang

For gpu memory issue caused by onnx, can you try again? The OOM might be caused by the memory fragmentation, we've just found some potential solution right now.

Sep 12 '25 08:09 yingfeng