PaddleOCR icon indicating copy to clipboard operation
PaddleOCR copied to clipboard

PaddleOCRVL推理500kb的图片,40G的显存竟然报OOM

Open minmie opened this issue 4 weeks ago • 11 comments

🔎 Search before asking

  • [x] I have searched the PaddleOCR Docs and found no similar bug report.
  • [x] I have searched the PaddleOCR Issues and found no similar bug report.
  • [x] I have searched the PaddleOCR Discussions and found no similar bug report.

🐛 Bug (问题描述)

很奇怪的问题,下面两张图片虽然不大,500kb左右,但是跑起来显存要40G+,一直造成OOM。请帮忙看一下


from paddleocr import PaddleOCRVL
import paddle
pipeline = PaddleOCRVL()

output = pipeline.predict("./imgs/1982753810856210448.JPEG")
for res in output:
    res.print()
    res.save_to_json(save_path="output1")
    res.save_to_markdown(save_path="output1")

paddle.device.cuda.empty_cache()
print(1)


错误日志

W1027 19:15:25.188622 1486515 gpu_resources.cc:114] Please NOTE: device: 0, GPU Compute Capability: 8.9, Driver API Version: 12.4, Runtime API Version: 12.6
Creating model: ('PP-DocLayoutV2', None)
Model files already exist. Using cached files. To redownload, please delete the directory manually: `/home/chenjq/.paddlex/official_models/PP-DocLayoutV2`.
Creating model: ('PaddleOCR-VL-0.9B', None)
Model files already exist. Using cached files. To redownload, please delete the directory manually: `/home/chenjq/.paddlex/official_models/PaddleOCR-VL`.
Loading configuration file /home/chenjq/.paddlex/official_models/PaddleOCR-VL/config.json
Loading weights file /home/chenjq/.paddlex/official_models/PaddleOCR-VL/model.safetensors
use GQA - num_heads: 16- num_key_value_heads: 2
use GQA - num_heads: 16- num_key_value_heads: 2
use GQA - num_heads: 16- num_key_value_heads: 2
use GQA - num_heads: 16- num_key_value_heads: 2
use GQA - num_heads: 16- num_key_value_heads: 2
use GQA - num_heads: 16- num_key_value_heads: 2
use GQA - num_heads: 16- num_key_value_heads: 2
use GQA - num_heads: 16- num_key_value_heads: 2
use GQA - num_heads: 16- num_key_value_heads: 2
use GQA - num_heads: 16- num_key_value_heads: 2
use GQA - num_heads: 16- num_key_value_heads: 2
use GQA - num_heads: 16- num_key_value_heads: 2
use GQA - num_heads: 16- num_key_value_heads: 2
use GQA - num_heads: 16- num_key_value_heads: 2
use GQA - num_heads: 16- num_key_value_heads: 2
use GQA - num_heads: 16- num_key_value_heads: 2
use GQA - num_heads: 16- num_key_value_heads: 2
use GQA - num_heads: 16- num_key_value_heads: 2
/home/chenjq/miniconda3/envs/paddle-vl/lib/python3.10/site-packages/paddle/utils/decorator_utils.py:420: Warning: 
Non compatible API. Please refer to https://www.paddlepaddle.org.cn/documentation/docs/en/develop/guides/model_convert/convert_from_pytorch/api_difference/torch/torch.split.html first.
  warnings.warn(
Loaded weights file from disk, setting weights to model.
All model checkpoint weights were used when initializing PaddleOCRVLForConditionalGeneration.

All the weights of PaddleOCRVLForConditionalGeneration were initialized from the model checkpoint at /home/chenjq/.paddlex/official_models/PaddleOCR-VL.
If your task is similar to the task the model of the checkpoint was trained on, you can already use PaddleOCRVLForConditionalGeneration for predictions without further training.
Loading configuration file /home/chenjq/.paddlex/official_models/PaddleOCR-VL/generation_config.json
Currently, the PaddleOCR-VL-0.9B local model only supports batch size of 1. The batch size will be updated to 1.
/home/chenjq/miniconda3/envs/paddle-vl/lib/python3.10/site-packages/paddle/tensor/creation.py:1088: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach(), rather than paddle.to_tensor(sourceTensor).
  return tensor(
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
/home/chenjq/miniconda3/envs/paddle-vl/lib/python3.10/site-packages/paddle/utils/decorator_utils.py:420: Warning: 
Non compatible API. Please refer to https://www.paddlepaddle.org.cn/documentation/docs/en/develop/guides/model_convert/convert_from_pytorch/api_difference/torch/torch.max.html first.
  warnings.warn(
Traceback (most recent call last):
  File "/home/chenjq/pythonWork/paddleocr-vl/main.py", line 13, in <module>
    output = pipeline.predict("./imgs/1982753810856210448.JPEG")
  File "/home/chenjq/miniconda3/envs/paddle-vl/lib/python3.10/site-packages/paddleocr/_pipelines/paddleocr_vl.py", line 134, in predict
    return list(
  File "/home/chenjq/miniconda3/envs/paddle-vl/lib/python3.10/site-packages/paddlex/inference/pipelines/_parallel.py", line 129, in predict
    yield from self._pipeline.predict(
  File "/home/chenjq/miniconda3/envs/paddle-vl/lib/python3.10/site-packages/paddlex/inference/pipelines/paddleocr_vl/pipeline.py", line 656, in predict
    raise RuntimeError(
RuntimeError: Exception from the 'vlm' worker: 

--------------------------------------
C++ Traceback (most recent call last):
--------------------------------------
0   paddle::pybind::eager_api_softmax(_object*, _object*, _object*)
1   softmax_ad_func(paddle::Tensor const&, int, paddle::optional<paddle::Tensor*>)
2   paddle::experimental::softmax(paddle::Tensor const&, int, paddle::optional<paddle::Tensor*>)
3   void phi::SoftmaxGPUDNNKernel<float, phi::GPUContext>(phi::GPUContext const&, phi::DenseTensor const&, int, phi::DenseTensor*)
4   float* phi::DeviceContext::Alloc<float>(phi::TensorBase*, unsigned long, bool) const
5   phi::DenseTensor::AllocateFrom(phi::Allocator*, phi::DataType, unsigned long, bool)
6   paddle::memory::allocation::Allocator::Allocate(unsigned long)
7   paddle::memory::allocation::StatAllocator::AllocateImpl(unsigned long)
8   paddle::memory::allocation::Allocator::Allocate(unsigned long)
9   paddle::memory::allocation::Allocator::Allocate(unsigned long)
10  std::string phi::enforce::GetCompleteTraceBackString<std::string >(std::string&&, char const*, int)
11  common::enforce::GetCurrentTraceBackString[abi:cxx11](bool)

----------------------
Error Message Summary:
----------------------
ResourceExhaustedError: 

Out of memory error on GPU 0. Cannot allocate 11.351109GB memory on GPU 0, 41.995789GB memory has been allocated and available memory is only 2.313477GB.

Please check whether there is any other process using GPU 0.
1. If yes, please stop them, or start PaddlePaddle on another GPU.
2. If no, please decrease the batch size of your model. 
 (at /paddle/paddle/phi/core/memory/allocation/cuda_allocator.cc:71)


Process finished with exit code 1


下面两张图片都会导致这个问题

Image

Image

🏃‍♂️ Environment (运行环境)

paddleocr 3.3.0 paddlepaddle-gpu 3.2.0 paddlex 3.3.4

🌰 Minimal Reproducible Example (最小可复现问题的Demo)

见上面

minmie avatar Oct 27 '25 11:10 minmie

几乎一样的问题。镜像启动什么都没干,就占用 5G 显存。处理一张图片,500KB 不到,显存占用直接飙升到 18GB。还挂了:

Out of memory error on GPU 0. Cannot allocate 5.086306GB memory on GPU 0, 18.543274GB memory has been allocated and available memory is only 4.944580GB.

Please check whether there is any other process using GPU 0.
1. If yes, please stop them, or start PaddlePaddle on another GPU.
2. If no, please decrease the batch size of your model.
 (at ../paddle/phi/core/memory/allocation/cuda_allocator.cc:71)

整得我没法理解 “PaddleOCR-VL” 怎么叫 0.9B 的小模型了。

district10 avatar Oct 27 '25 16:10 district10

@district10 是的,我测了很多图片,有的图片10M+,有的图片也500kb,都没问题,就这两张图片比较特殊。

minmie avatar Oct 28 '25 01:10 minmie

我也遇到了这个问题,官方demo的2M的图片 大概7 8 秒左右处理完了 显存也还好 我处理我的700kb图片居然要30G的显存 不理解

948024326 avatar Oct 28 '25 06:10 948024326

大概率是llm解码器那边解码停不下来了

MarginGitHub avatar Oct 29 '25 02:10 MarginGitHub

大概率是llm解码器那边解码停不下来了

您好,请问这是bug吗? 还是有解决方法?

948024326 avatar Oct 29 '25 03:10 948024326

同样的问题,部分不大的图片在4090中爆显存了,服务启动后就占用了15g显存,发送图片进行解析后显存占用达到了21g

`---------------------- Error Message Summary:

ResourceExhaustedError:

Out of memory error on GPU 0. Cannot allocate 6.067032GB memory on GPU 0, 21.457275GB memory has been allocated and available memory is only 2.059204GB.

Please check whether there is any other process using GPU 0.

  1. If yes, please stop them, or start PaddlePaddle on another GPU.
  2. If no, please decrease the batch size of your model. (at /paddle/paddle/phi/core/memory/allocation/cuda_allocator.cc:71)

INFO: 192.168.101.4:33598 - "POST /layout-parsing HTTP/1.1" 500 Internal Server Error ERROR: Exception in ASGI application Traceback (most recent call last): File "/opt/paddleocrvl/.venv/lib/python3.10/site-packages/uvicorn/protocols/http/h11_impl.py", line 403, in run_asgi result = await app( # type: ignore[func-returns-value] File "/opt/paddleocrvl/.venv/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 60, in call return await self.app(scope, receive, send) File "/opt/paddleocrvl/.venv/lib/python3.10/site-packages/fastapi/applications.py", line 1133, in call await super().call(scope, receive, send) File "/opt/paddleocrvl/.venv/lib/python3.10/site-packages/starlette/applications.py", line 113, in call await self.middleware_stack(scope, receive, send) File "/opt/paddleocrvl/.venv/lib/python3.10/site-packages/starlette/middleware/errors.py", line 186, in call raise exc File "/opt/paddleocrvl/.venv/lib/python3.10/site-packages/starlette/middleware/errors.py", line 164, in call await self.app(scope, receive, _send) File "/opt/paddleocrvl/.venv/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 63, in call await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send) File "/opt/paddleocrvl/.venv/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app raise exc File "/opt/paddleocrvl/.venv/lib/python3.10/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app await app(scope, receive, sender) File "/opt/paddleocrvl/.venv/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in call await self.app(scope, receive, send) File "/opt/paddleocrvl/.venv/lib/python3.10/site-packages/starlette/routing.py", line 716, in call await self.middleware_stack(scope, receive, send) File "/opt/paddleocrvl/.venv/lib/python3.10/site-packages/starlette/routing.py", line 736, in app await route.handle(scope, receive, send) File "/opt/paddleocrvl/.venv/lib/python3.10/site-packages/starlette/routing.py", line 290, in handle await self.app(scope, receive, send) File "/opt/paddleocrvl/.venv/lib/python3.10/site-packages/fastapi/routing.py", line 123, in app await wrap_app_handling_exceptions(app, request)(scope, receive, send) File "/opt/paddleocrvl/.venv/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app raise exc File "/opt/paddleocrvl/.venv/lib/python3.10/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app await app(scope, receive, sender) File "/opt/paddleocrvl/.venv/lib/python3.10/site-packages/fastapi/routing.py", line 109, in app response = await f(request) File "/opt/paddleocrvl/.venv/lib/python3.10/site-packages/fastapi/routing.py", line 389, in app raw_response = await run_endpoint_function( File "/opt/paddleocrvl/.venv/lib/python3.10/site-packages/fastapi/routing.py", line 288, in run_endpoint_function return await dependant.call(**values) File "/opt/paddleocrvl/.venv/lib/python3.10/site-packages/paddlex/inference/serving/basic_serving/_pipeline_apps/paddleocr_vl.py", line 54, in _infer result = await pipeline.infer( File "/opt/paddleocrvl/.venv/lib/python3.10/site-packages/paddlex/inference/serving/basic_serving/_app.py", line 104, in infer return await self.call(_infer, *args, **kwargs) File "/opt/paddleocrvl/.venv/lib/python3.10/site-packages/paddlex/inference/serving/basic_serving/_app.py", line 111, in call return await fut File "/opt/paddleocrvl/.venv/lib/python3.10/site-packages/paddlex/inference/serving/basic_serving/_app.py", line 126, in _worker result = func(*args, **kwargs) File "/opt/paddleocrvl/.venv/lib/python3.10/site-packages/paddlex/inference/serving/basic_serving/_app.py", line 95, in _infer for item in it: File "/opt/paddleocrvl/.venv/lib/python3.10/site-packages/paddlex/inference/pipelines/_parallel.py", line 123, in predict yield from self._executor.execute( File "/opt/paddleocrvl/.venv/lib/python3.10/site-packages/paddlex/inference/pipelines/_parallel.py", line 67, in execute result = future.result() File "/usr/lib/python3.10/concurrent/futures/_base.py", line 458, in result return self.__get_result() File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result raise self._exception File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, **self.kwargs) File "/opt/paddleocrvl/.venv/lib/python3.10/site-packages/paddlex/inference/pipelines/_parallel.py", line 53, in lambda pipeline, input_instances, args, kwargs: list( File "/opt/paddleocrvl/.venv/lib/python3.10/site-packages/paddlex/inference/pipelines/paddleocr_vl/pipeline.py", line 673, in predict raise RuntimeError( RuntimeError: Exception from the 'vlm' worker:


C++ Traceback (most recent call last):

0 paddle::pybind::CallScalarFunction(paddle::Tensor const&, double, std::string) 1 scale_ad_func(paddle::Tensor const&, paddle::experimental::ScalarBasepaddle::Tensor, paddle::experimental::ScalarBasepaddle::Tensor, bool, paddle::optionalpaddle::Tensor*) 2 paddle::experimental::scale(paddle::Tensor const&, paddle::experimental::ScalarBasepaddle::Tensor const&, paddle::experimental::ScalarBasepaddle::Tensor const&, bool, paddle::optionalpaddle::Tensor*) 3 void phi::ScaleKernel<phi::dtype::bfloat16, phi::GPUContext>(phi::GPUContext const&, phi::DenseTensor const&, paddle::experimental::ScalarBasephi::DenseTensor const&, paddle::experimental::ScalarBasephi::DenseTensor const&, bool, phi::DenseTensor*) 4 phi::dtype::bfloat16* phi::DeviceContext::Allocphi::dtype::bfloat16(phi::TensorBase*, unsigned long, bool) const 5 phi::DenseTensor::AllocateFrom(phi::Allocator*, phi::DataType, unsigned long, bool) 6 paddle::memory::allocation::Allocator::Allocate(unsigned long) 7 paddle::memory::allocation::StatAllocator::AllocateImpl(unsigned long) 8 paddle::memory::allocation::Allocator::Allocate(unsigned long) 9 paddle::memory::allocation::Allocator::Allocate(unsigned long) 10 std::string phi::enforce::GetCompleteTraceBackString<std::string >(std::string&&, char const*, int) 11 common::enforce::GetCurrentTraceBackStringabi:cxx11


Error Message Summary:

ResourceExhaustedError:

Out of memory error on GPU 0. Cannot allocate 6.067032GB memory on GPU 0, 21.457275GB memory has been allocated and available memory is only 2.059204GB.

Please check whether there is any other process using GPU 0.

  1. If yes, please stop them, or start PaddlePaddle on another GPU.
  2. If no, please decrease the batch size of your model. (at /paddle/paddle/phi/core/memory/allocation/cuda_allocator.cc:71)`

Halfknow avatar Oct 29 '25 09:10 Halfknow

我也遇到了这个问题,官方demo的2M的图片 大概7 8 秒左右处理完了 显存也还好 我处理我的700kb图片居然要30G的显存 不理解

我用了vllm部署和服务调用的方法 没有内存问题了

948024326 avatar Oct 30 '25 02:10 948024326

相关的问题我们排查下哈~

cuicheng01 avatar Oct 30 '25 06:10 cuicheng01

可以参考这个推理部署相关高频问题回复 https://github.com/PaddlePaddle/PaddleOCR/discussions/16822

Image

我们也会继续完善文档,保障大家的部署体验

zhang-prog avatar Oct 30 '25 11:10 zhang-prog

🔎 Search before asking

  • [x] I have searched the PaddleOCR Docs and found no similar bug report.[x] I have searched the PaddleOCR Issues and found no similar bug report.[x] I have searched the PaddleOCR Discussions and found no similar bug report.

🐛 Bug (问题描述)

很奇怪的问题,下面两张图片虽然不大,500kb左右,但是跑起来显存要40G+,一直造成OOM。请帮忙看一下

from paddleocr import PaddleOCRVL import paddle pipeline = PaddleOCRVL()

output = pipeline.predict("./imgs/1982753810856210448.JPEG") for res in output: res.print() res.save_to_json(save_path="output1") res.save_to_markdown(save_path="output1")

paddle.device.cuda.empty_cache() print(1) 错误日志

W1027 19:15:25.188622 1486515 gpu_resources.cc:114] Please NOTE: device: 0, GPU Compute Capability: 8.9, Driver API Version: 12.4, Runtime API Version: 12.6
Creating model: ('PP-DocLayoutV2', None)
Model files already exist. Using cached files. To redownload, please delete the directory manually: `/home/chenjq/.paddlex/official_models/PP-DocLayoutV2`.
Creating model: ('PaddleOCR-VL-0.9B', None)
Model files already exist. Using cached files. To redownload, please delete the directory manually: `/home/chenjq/.paddlex/official_models/PaddleOCR-VL`.
Loading configuration file /home/chenjq/.paddlex/official_models/PaddleOCR-VL/config.json
Loading weights file /home/chenjq/.paddlex/official_models/PaddleOCR-VL/model.safetensors
use GQA - num_heads: 16- num_key_value_heads: 2
use GQA - num_heads: 16- num_key_value_heads: 2
use GQA - num_heads: 16- num_key_value_heads: 2
use GQA - num_heads: 16- num_key_value_heads: 2
use GQA - num_heads: 16- num_key_value_heads: 2
use GQA - num_heads: 16- num_key_value_heads: 2
use GQA - num_heads: 16- num_key_value_heads: 2
use GQA - num_heads: 16- num_key_value_heads: 2
use GQA - num_heads: 16- num_key_value_heads: 2
use GQA - num_heads: 16- num_key_value_heads: 2
use GQA - num_heads: 16- num_key_value_heads: 2
use GQA - num_heads: 16- num_key_value_heads: 2
use GQA - num_heads: 16- num_key_value_heads: 2
use GQA - num_heads: 16- num_key_value_heads: 2
use GQA - num_heads: 16- num_key_value_heads: 2
use GQA - num_heads: 16- num_key_value_heads: 2
use GQA - num_heads: 16- num_key_value_heads: 2
use GQA - num_heads: 16- num_key_value_heads: 2
/home/chenjq/miniconda3/envs/paddle-vl/lib/python3.10/site-packages/paddle/utils/decorator_utils.py:420: Warning: 
Non compatible API. Please refer to https://www.paddlepaddle.org.cn/documentation/docs/en/develop/guides/model_convert/convert_from_pytorch/api_difference/torch/torch.split.html first.
  warnings.warn(
Loaded weights file from disk, setting weights to model.
All model checkpoint weights were used when initializing PaddleOCRVLForConditionalGeneration.

All the weights of PaddleOCRVLForConditionalGeneration were initialized from the model checkpoint at /home/chenjq/.paddlex/official_models/PaddleOCR-VL.
If your task is similar to the task the model of the checkpoint was trained on, you can already use PaddleOCRVLForConditionalGeneration for predictions without further training.
Loading configuration file /home/chenjq/.paddlex/official_models/PaddleOCR-VL/generation_config.json
Currently, the PaddleOCR-VL-0.9B local model only supports batch size of 1. The batch size will be updated to 1.
/home/chenjq/miniconda3/envs/paddle-vl/lib/python3.10/site-packages/paddle/tensor/creation.py:1088: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach(), rather than paddle.to_tensor(sourceTensor).
  return tensor(
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
/home/chenjq/miniconda3/envs/paddle-vl/lib/python3.10/site-packages/paddle/utils/decorator_utils.py:420: Warning: 
Non compatible API. Please refer to https://www.paddlepaddle.org.cn/documentation/docs/en/develop/guides/model_convert/convert_from_pytorch/api_difference/torch/torch.max.html first.
  warnings.warn(
Traceback (most recent call last):
  File "/home/chenjq/pythonWork/paddleocr-vl/main.py", line 13, in <module>
    output = pipeline.predict("./imgs/1982753810856210448.JPEG")
  File "/home/chenjq/miniconda3/envs/paddle-vl/lib/python3.10/site-packages/paddleocr/_pipelines/paddleocr_vl.py", line 134, in predict
    return list(
  File "/home/chenjq/miniconda3/envs/paddle-vl/lib/python3.10/site-packages/paddlex/inference/pipelines/_parallel.py", line 129, in predict
    yield from self._pipeline.predict(
  File "/home/chenjq/miniconda3/envs/paddle-vl/lib/python3.10/site-packages/paddlex/inference/pipelines/paddleocr_vl/pipeline.py", line 656, in predict
    raise RuntimeError(
RuntimeError: Exception from the 'vlm' worker: 

--------------------------------------
C++ Traceback (most recent call last):
--------------------------------------
0   paddle::pybind::eager_api_softmax(_object*, _object*, _object*)
1   softmax_ad_func(paddle::Tensor const&, int, paddle::optional<paddle::Tensor*>)
2   paddle::experimental::softmax(paddle::Tensor const&, int, paddle::optional<paddle::Tensor*>)
3   void phi::SoftmaxGPUDNNKernel<float, phi::GPUContext>(phi::GPUContext const&, phi::DenseTensor const&, int, phi::DenseTensor*)
4   float* phi::DeviceContext::Alloc<float>(phi::TensorBase*, unsigned long, bool) const
5   phi::DenseTensor::AllocateFrom(phi::Allocator*, phi::DataType, unsigned long, bool)
6   paddle::memory::allocation::Allocator::Allocate(unsigned long)
7   paddle::memory::allocation::StatAllocator::AllocateImpl(unsigned long)
8   paddle::memory::allocation::Allocator::Allocate(unsigned long)
9   paddle::memory::allocation::Allocator::Allocate(unsigned long)
10  std::string phi::enforce::GetCompleteTraceBackString<std::string >(std::string&&, char const*, int)
11  common::enforce::GetCurrentTraceBackString[abi:cxx11](bool)

----------------------
Error Message Summary:
----------------------
ResourceExhaustedError: 

Out of memory error on GPU 0. Cannot allocate 11.351109GB memory on GPU 0, 41.995789GB memory has been allocated and available memory is only 2.313477GB.

Please check whether there is any other process using GPU 0.
1. If yes, please stop them, or start PaddlePaddle on another GPU.
2. If no, please decrease the batch size of your model. 
 (at /paddle/paddle/phi/core/memory/allocation/cuda_allocator.cc:71)


Process finished with exit code 1

下面两张图片都会导致这个问题

Image

Image

🏃‍♂️ Environment (运行环境)

paddleocr 3.3.0 paddlepaddle-gpu 3.2.0 paddlex 3.3.4

🌰 Minimal Reproducible Example (最小可复现问题的Demo)

见上面

感谢您的反馈,想确认下在最新版本的PaddlePaddle和PaddleX以及PaddleOCR上是否还能复现这个问题呢,我们这边在多台机器上复测目前均未出现OOM情况

changdazhou avatar Nov 06 '25 08:11 changdazhou

几乎一样的问题。镜像启动什么都没干,就占用 5G 显存。处理一张图片,500KB 不到,显存占用直接飙升到 18GB。还挂了:

Out of memory error on GPU 0. Cannot allocate 5.086306GB memory on GPU 0, 18.543274GB memory has been allocated and available memory is only 4.944580GB.

Please check whether there is any other process using GPU 0.
1. If yes, please stop them, or start PaddlePaddle on another GPU.
2. If no, please decrease the batch size of your model.
 (at ../paddle/phi/core/memory/allocation/cuda_allocator.cc:71)

整得我没法理解 “PaddleOCR-VL” 怎么叫 0.9B 的小模型了。

感谢反馈,方便也提供几张测试图片吗,我们正在排查此问题,该issue上的图片我们没能复现显存溢出问题

changdazhou avatar Nov 06 '25 08:11 changdazhou