🔎 Search before asking
- [X] I have searched the PaddleOCR Docs and found no similar bug report.
- [X] I have searched the PaddleOCR Issues and found no similar bug report.
- [X] I have searched the PaddleOCR Discussions and found no similar bug report.
🐛 Bug (问题描述)
[2024/08/27 19:14:23] ppocr INFO: epoch: [5/500], global_step: 10, lr: 0.001000, loss: 2.168079, loss_shrink_maps: 1.022120, loss_threshold_maps: 0.760488, loss_binary_maps: 0.204714, loss_cbn: 0.204714, avg_reader_cost: 0.03694 s, avg_batch_cost: 0.04500 s, avg_samples: 0.12, ips: 2.66682 samples/s, eta: 0:41:51, max_mem_reserved: 13909 MB, max_mem_allocated: 11894 MB
eval model:: 0%| | 0/4 [00:00<?, ?it/s]Traceback (most recent call last):
File "/app/ocr/PaddleOCR-release-2.8/tools/train.py", line 257, in
main(config, device, logger, vdl_writer, seed)
File "/app/ocr/PaddleOCR-release-2.8/tools/train.py", line 209, in main
program.train(
File "/app/ocr/PaddleOCR-release-2.8/tools/program.py", line 452, in train
cur_metric = eval(
File "/app/ocr/PaddleOCR-release-2.8/tools/program.py", line 622, in eval
preds = model(images)
File "/home/anaconda3/envs/pd-ocr/lib/python3.10/site-packages/paddle/nn/layer/layers.py", line 1429, in call
return self.forward(*inputs, **kwargs)
File "/app/ocr/PaddleOCR-release-2.8/ppocr/modeling/architectures/base_model.py", line 99, in forward
x = self.head(x, targets=data)
File "/home/anaconda3/envs/pd-ocr/lib/python3.10/site-packages/paddle/nn/layer/layers.py", line 1429, in call
return self.forward(*inputs, **kwargs)
File "/app/ocr/PaddleOCR-release-2.8/ppocr/modeling/heads/det_db_head.py", line 145, in forward
cbn_maps = self.cbn_layer(self.up_conv(f), shrink_maps, None)
File "/home/anaconda3/envs/pd-ocr/lib/python3.10/site-packages/paddle/nn/layer/layers.py", line 1429, in call
return self.forward(*inputs, **kwargs)
File "/app/ocr/PaddleOCR-release-2.8/ppocr/modeling/heads/det_db_head.py", line 127, in forward
out = self.last_1(self.last_3(outf))
File "/home/anaconda3/envs/pd-ocr/lib/python3.10/site-packages/paddle/nn/layer/layers.py", line 1429, in call
return self.forward(*inputs, **kwargs)
File "/app/ocr/PaddleOCR-release-2.8/ppocr/modeling/backbones/det_mobilenet_v3.py", line 186, in forward
x = self.conv(x)
File "/home/anaconda3/envs/pd-ocr/lib/python3.10/site-packages/paddle/nn/layer/layers.py", line 1429, in call
return self.forward(*inputs, **kwargs)
File "/home/anaconda3/envs/pd-ocr/lib/python3.10/site-packages/paddle/nn/layer/conv.py", line 715, in forward
out = F.conv._conv_nd(
File "/home/anaconda3/envs/pd-ocr/lib/python3.10/site-packages/paddle/nn/functional/conv.py", line 128, in _conv_nd
pre_bias = _C_ops.conv2d(
MemoryError:
C++ Traceback (most recent call last):
0 paddle::pybind::eager_api_conv2d(_object*, _object*, _object*)
1 conv2d_ad_func(paddle::Tensor const&, paddle::Tensor const&, std::vector<int, std::allocator >, std::vector<int, std::allocator >, std::string, std::vector<int, std::allocator >, int, std::string)
2 paddle::experimental::conv2d(paddle::Tensor const&, paddle::Tensor const&, std::vector<int, std::allocator > const&, std::vector<int, std::allocator > const&, std::string const&, std::vector<int, std::allocator > const&, int, std::string const&)
3 void phi::ConvCudnnKernel<float, phi::GPUContext>(phi::GPUContext const&, phi::DenseTensor const&, phi::DenseTensor const&, std::vector<int, std::allocator > const&, std::vector<int, std::allocator > const&, std::string const&, std::vector<int, std::allocator > const&, int, std::string const&, phi::DenseTensor*)
4 float* phi::DeviceContext::Alloc(phi::TensorBase*, unsigned long, bool) const
5 phi::DeviceContext::Impl::Alloc(phi::TensorBase*, phi::Place const&, phi::DataType, unsigned long, bool, bool) const
6 phi::DenseTensor::AllocateFrom(phi::Allocator*, phi::DataType, unsigned long, bool)
7 paddle::memory::allocation::Allocator::Allocate(unsigned long)
8 paddle::memory::allocation::StatAllocator::AllocateImpl(unsigned long)
9 paddle::memory::allocation::Allocator::Allocate(unsigned long)
10 paddle::memory::allocation::Allocator::Allocate(unsigned long)
11 paddle::memory::allocation::Allocator::Allocate(unsigned long)
12 paddle::memory::allocation::Allocator::Allocate(unsigned long)
13 paddle::memory::allocation::CUDAAllocator::AllocateImpl(unsigned long)
14 std::string phi::enforce::GetCompleteTraceBackString<std::string >(std::string&&, char const*, int)
15 phi::enforce::GetCurrentTraceBackStringabi:cxx11
Error Message Summary:
ResourceExhaustedError:
Out of memory error on GPU 1. Cannot allocate 3.158203GB memory on GPU 1, 13.315369GB memory has been allocated and available memory is only 2.386902GB.
Please check whether there is any other process using GPU 1.
- If yes, please stop them, or start PaddlePaddle on another GPU.
- If no, please decrease the batch size of your model.
(at /paddle/paddle/fluid/memory/allocation/cuda_allocator.cc:86)
🏃♂️ Environment (运行环境)
PaddlePaddle-gpu:2.6 PaddleOCR:2.8 RAM:16G
🌰 Minimal Reproducible Example (最小可复现问题的Demo)
python tools/train.py -c configs/det/ch_PP-OCRv4/ch_PP-OCRv4_det_teacher.yml