[Bug]: If some chunks failed due to GPU OOM, the executor won't retry the failed chunk and entire task will fail.
Is there an existing issue for the same bug?
- [x] I have checked the existing issues.
RAGFlow workspace code commit ID
448fa1c
RAGFlow image version
v0.16.0
Other environment information
GPU: A10 (24G)
OS: Ubuntu 22.04
nvidia: NVIDIA-SMI 560.35.03 Driver Version: 560.35.03 CUDA Version: 12.6
Actual behavior
patx_stack_rag-executor.1.tgcik8tmwx7j@iZbp17ecmtdol4tehvhlo7Z | 2025-02-17 20:46:19.147064849 [E:onnxruntime:, sequential_executor.cc:516 ExecuteKernel] Non-zero status code returned while running Conv node. Name:'/model.19/cv2/conv/Conv' Status Message: /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:123 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, common::Status> = void] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:116 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, common::Status> = void] CUDA failure 2: out of memory ; GPU=0 ; hostname=8a8fe068e7eb ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_allocator.cc ; line=47 ; expr=cudaMalloc((void**)&p, size); patx_stack_rag-executor.1.tgcik8tmwx7j@iZbp17ecmtdol4tehvhlo7Z | patx_stack_rag-executor.1.tgcik8tmwx7j@iZbp17ecmtdol4tehvhlo7Z | patx_stack_rag-executor.1.tgcik8tmwx7j@iZbp17ecmtdol4tehvhlo7Z | 2025-02-17 20:46:19,153 INFO 10 set_progress(a9b49952ed2c11efbe6b02420a000082), progress: -1, progress_msg: 20:46:19 Page(121~133): [ERROR]Internal server error while chunking: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running Conv node. Name:/model.19/cv2/conv/Conv Status Message: /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:123 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, common::Status> = void] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:116 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, common::Status> = void] CUDA failure 2: out of memory ; GPU=0 ; hostname=8a8fe068e7eb ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_allocator.cc ; line=47 ; expr=cudaMalloc((void**)&p, size); patx_stack_rag-executor.1.tgcik8tmwx7j@iZbp17ecmtdol4tehvhlo7Z | patx_stack_rag-executor.1.tgcik8tmwx7j@iZbp17ecmtdol4tehvhlo7Z | patx_stack_rag-executor.1.tgcik8tmwx7j@iZbp17ecmtdol4tehvhlo7Z | 2025-02-17 20:46:19,164 ERROR 10 Chunking 专利审查指南2023(官网发布版).pdf/专利审查指南2023(官网发布版).pdf got exception patx_stack_rag-executor.1.tgcik8tmwx7j@iZbp17ecmtdol4tehvhlo7Z | Traceback (most recent call last): patx_stack_rag-executor.1.tgcik8tmwx7j@iZbp17ecmtdol4tehvhlo7Z | File "/ragflow/rag/svr/task_executor.py", line 218, in build_chunks patx_stack_rag-executor.1.tgcik8tmwx7j@iZbp17ecmtdol4tehvhlo7Z | cks = chunker.chunk(task["name"], binary=binary, from_page=task["from_page"], patx_stack_rag-executor.1.tgcik8tmwx7j@iZbp17ecmtdol4tehvhlo7Z | File "/ragflow/rag/app/laws.py", line 168, in chunk patx_stack_rag-executor.1.tgcik8tmwx7j@iZbp17ecmtdol4tehvhlo7Z | for txt, poss in pdf_parser(filename if not binary else binary, patx_stack_rag-executor.1.tgcik8tmwx7j@iZbp17ecmtdol4tehvhlo7Z | File "/ragflow/rag/app/laws.py", line 131, in call patx_stack_rag-executor.1.tgcik8tmwx7j@iZbp17ecmtdol4tehvhlo7Z | self._layouts_rec(zoomin) patx_stack_rag-executor.1.tgcik8tmwx7j@iZbp17ecmtdol4tehvhlo7Z | File "/ragflow/deepdoc/parser/pdf_parser.py", line 327, in _layouts_rec patx_stack_rag-executor.1.tgcik8tmwx7j@iZbp17ecmtdol4tehvhlo7Z | self.boxes, self.page_layout = self.layouter( patx_stack_rag-executor.1.tgcik8tmwx7j@iZbp17ecmtdol4tehvhlo7Z | File "/ragflow/deepdoc/vision/layout_recognizer.py", line 70, in call patx_stack_rag-executor.1.tgcik8tmwx7j@iZbp17ecmtdol4tehvhlo7Z | layouts = super().call(image_list, thr, batch_size) patx_stack_rag-executor.1.tgcik8tmwx7j@iZbp17ecmtdol4tehvhlo7Z | File "/ragflow/deepdoc/vision/recognizer.py", line 483, in call patx_stack_rag-executor.1.tgcik8tmwx7j@iZbp17ecmtdol4tehvhlo7Z | bb = self.postprocess(self.ort_sess.run(None, {k:v for k,v in ins.items() if k in self.input_names}, self.run_options)[0], ins, thr) patx_stack_rag-executor.1.tgcik8tmwx7j@iZbp17ecmtdol4tehvhlo7Z | File "/ragflow/.venv/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 220, in run patx_stack_rag-executor.1.tgcik8tmwx7j@iZbp17ecmtdol4tehvhlo7Z | return self._sess.run(output_names, input_feed, run_options) patx_stack_rag-executor.1.tgcik8tmwx7j@iZbp17ecmtdol4tehvhlo7Z | onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running Conv node. Name:'/model.19/cv2/conv/Conv' Status Message: /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:123 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, common::Status> = void] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:116 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, common::Status> = void] CUDA failure 2: out of memory ; GPU=0 ; hostname=8a8fe068e7eb ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_allocator.cc ; line=47 ; expr=cudaMalloc((void**)&p, size);
Expected behavior
No response
Steps to reproduce
I set WS=2 so two tasks are running parallel. In peak conditions, the total amount of GPU memory used by the two tasks exceeds 24GB, but each individual task usually uses less than 5GB. Therefore, if Ragflow can implement an OOM (Out of Memory) retry mechanism, the tasks should run successfully.
Additional information
No response
onnxruntime does not support GPU well.
onnxruntime does not support GPU well.
can ragflow add some kind of retry mechanism on chunk level?
For gpu memory issue caused by onnx, can you try again? The OOM might be caused by the memory fragmentation, we've just found some potential solution right now.