PaddleClas icon indicating copy to clipboard operation
PaddleClas copied to clipboard

官方样例demo报错: MemoryError: std::bad_alloc

Open ocivo opened this issue 11 months ago • 9 comments

直接按官方样例demo写的

  1. paddleClas版本 2.6.0 paddlepaddle版本2.6.2
  2. python 3.8
  3. paddle-cpu版本和gpu版本报相同的错误 python run.py 2025-01-03 18:29:51 INFO: Loading faiss with AVX512 support. 2025-01-03 18:29:51 INFO: Successfully loaded faiss with AVX512 support. [2025/01/03 18:29:51] ppcls WARNING: The current running environment does not support the use of GPU. CPU has been used instead. Traceback (most recent call last): File "run.py", line 2, in model = paddleclas.PaddleClas(model_name="person_attribute") File "venv/lib/python3.8/site-packages/paddleclas/paddleclas.py", line 610, in init self.predictor = ClsPredictor(self._config) File "venv/lib/python3.8/site-packages/paddleclas/deploy/python/predict_cls.py", line 28, in init super().init(config["Global"]) File "venv/lib/python3.8/site-packages/paddleclas/deploy/utils/predictor.py", line 37, in init self.predictor, self.config = self.create_paddle_predictor( File "venv/lib/python3.8/site-packages/paddleclas/deploy/utils/predictor.py", line 108, in create_paddle_predictor predictor = create_predictor(config) MemoryError: std::bad_alloc

ocivo avatar Jan 03 '25 10:01 ocivo

请提供一下完整的启动命令

TingquanGao avatar Jan 03 '25 12:01 TingquanGao

@TingquanGao

我来提供一个复现这个问题的方法:

复现过程

首先,将以下测试脚本test.py保存在某处,例如/example/test.py

import paddleclas

model = paddleclas.PaddleClas(model_name="text_image_orientation")

然后,运行以下命令启动docker容器,该容器是一个纯净的Debian Python容器。

docker run --gpus all -it --rm --shm-size=1g -v "/example:/example" python:3.10-slim bash

在容器中、安装必要的依赖项,并运行测试脚本。这里安装的是CPU版本的依赖项,因为目前Paddle的GPU版本和本人使用的其他库不兼容。

# Make the dependencies of OpenCV complete.
apt-get update
apt-get -y install libgomp1 libgl1-mesa-glx libglib2.0-0
# Install PaddleClas
pip install paddlepaddle paddleclas
# Run the test.
cd /example
python test.py

报错结果

于是,得到以下报错:

[2025/01/15 20:08:51] ppcls INFO: download https://paddleclas.bj.bcebos.com/models/PULC/inference/text_image_orientation_infer.tar to /root/.paddleclas/inference_model/PULC/text_image_orientation/text_image_orientation_infer.tar
100%|████████████████████████████████████████████████████████████████████████████| 7.40M/7.40M [00:18<00:00, 402kiB/s]
[2025/01/15 20:09:12] ppcls WARNING: The current running environment does not support the use of GPU. CPU has been used instead.
Traceback (most recent call last):
  File "/example/test.py", line 3, in <module>
    model = paddleclas.PaddleClas(model_name="text_image_orientation")
  File "/usr/local/lib/python3.10/site-packages/paddleclas/paddleclas.py", line 610, in __init__
    self.predictor = ClsPredictor(self._config)
  File "/usr/local/lib/python3.10/site-packages/paddleclas/deploy/python/predict_cls.py", line 28, in __init__
    super().__init__(config["Global"])
  File "/usr/local/lib/python3.10/site-packages/paddleclas/deploy/utils/predictor.py", line 37, in __init__
    self.predictor, self.config = self.create_paddle_predictor(
  File "/usr/local/lib/python3.10/site-packages/paddleclas/deploy/utils/predictor.py", line 108, in create_paddle_predictor
    predictor = create_predictor(config)
MemoryError: std::bad_alloc

深入测试

是shared memory的问题吗?

有没有可能,是shared memory太小?我尝试将container的启动参数改为:

docker run --gpus all -it --rm --shm-size=16g -v "/example:/example" python:3.10-slim bash

这个大小已经比教程里还大了。

然而错误如故。

是因为使用了GPU映射的关系吗?

尝试将--gpus all参数去除,

docker run -it --rm --shm-size=1g -v "/example:/example" python:3.10-slim bash

然而错误如故。

是paddlepaddle的安装有问题、或是因为paddlepaddle的CPU版本不可用吗?

我刚好知道NVIDIA有一个paddlepaddle的镜像,透过尝试运行它,并重复上述测试,

docker run --gpus all -it --rm --shm-size=1g -v "/example:/example" nvcr.io/nvidia/paddlepaddle:24.10-py3

为什么要用24.10而不是更新的版本(例如24.12)?这是因为,paddleclas不支持python 3.12,而镜像版本24.10是最后一个使用Ubuntu 22.04和Python 3.10的版本。注意它的CUDA仍然是几乎最新的12.6,且其已经内置了paddlepaddle-gpu的版本。

注意由于镜像里已经有了GPU版本的paddlepaddle-gpu,安装过程需要修改一下:

# Make the dependencies of OpenCV complete.
# Note that this is an Ubuntu image.
apt-get update
apt-get -y install libgomp1 libegl1 libglu1-mesa-dev
# Do not need to install paddlepaddle because paddlepaddle-gpu already exists.
pip install paddleclas
# Run the test.
cd /example
python test.py

这回依然报错,但是报错的内容不一样了

[2025/01/15 20:46:49] ppcls INFO: download https://paddleclas.bj.bcebos.com/models/PULC/inference/text_image_orientation_infer.tar to /root/.paddleclas/inference_model/PULC/text_image_orientation/text_image_orientation_infer.tar
100%|██████████████████████████████████████████████████████████████████████████████| 7.40M/7.40M [00:18<00:00, 406kiB/s]
Traceback (most recent call last):
  File "/example/test.py", line 3, in <module>
    model = paddleclas.PaddleClas(model_name="text_image_orientation")
  File "/usr/local/lib/python3.10/dist-packages/paddleclas/paddleclas.py", line 610, in __init__
    self.predictor = ClsPredictor(self._config)
  File "/usr/local/lib/python3.10/dist-packages/paddleclas/deploy/python/predict_cls.py", line 28, in __init__
    super().__init__(config["Global"])
  File "/usr/local/lib/python3.10/dist-packages/paddleclas/deploy/utils/predictor.py", line 37, in __init__
    self.predictor, self.config = self.create_paddle_predictor(
  File "/usr/local/lib/python3.10/dist-packages/paddleclas/deploy/utils/predictor.py", line 108, in create_paddle_predictor
    predictor = create_predictor(config)
ValueError: basic_string::_M_replace_aux

使用官方的paddle镜像

严格按照教程,使用官方的paddle镜像。

docker run --gpus all --name ppcls -it --rm -v "/example:/example" --shm-size=8G --network=host paddlepaddle/paddle:2.3.0-gpu-cuda10.2-cudnn7 /bin/bash

运行后,在容器内,安装并测试

# Install dependencies. Do not need to fix OpenCV issues.
pip install paddleclas
# Run the test.
cd /example
python test.py

成功运行起来了。

grep: warning: GREP_OPTIONS is deprecated; please use an alias or script
2025-01-15 21:16:35 INFO: Loading faiss with AVX2 support.
2025-01-15 21:16:35 INFO: Could not load library with AVX2 support due to:
ModuleNotFoundError("No module named 'faiss.swigfaiss_avx2'")
2025-01-15 21:16:35 INFO: Loading faiss.
2025-01-15 21:16:35 INFO: Successfully loaded faiss.
[2025/01/15 21:16:35] ppcls INFO: download https://paddleclas.bj.bcebos.com/models/PULC/inference/text_image_orientation_infer.tar to /root/.paddleclas/inference_model/PULC/text_image_orientation/text_image_orientation_infer.tar
100%|██████████████████████████████████████████████████████████████████████████████| 7.40M/7.40M [00:18<00:00, 410kiB/s]
/usr/local/python3.7.0/lib/python3.7/site-packages/paddleclas/deploy/python/preprocess.py:67: DeprecationWarning: NEAREST is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.NEAREST or Dither.NONE instead.
  'nearest': Image.NEAREST,
/usr/local/python3.7.0/lib/python3.7/site-packages/paddleclas/deploy/python/preprocess.py:68: DeprecationWarning: BILINEAR is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BILINEAR instead.
  'bilinear': Image.BILINEAR,
/usr/local/python3.7.0/lib/python3.7/site-packages/paddleclas/deploy/python/preprocess.py:69: DeprecationWarning: BICUBIC is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BICUBIC instead.
  'bicubic': Image.BICUBIC,
/usr/local/python3.7.0/lib/python3.7/site-packages/paddleclas/deploy/python/preprocess.py:70: DeprecationWarning: BOX is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BOX instead.
  'box': Image.BOX,
/usr/local/python3.7.0/lib/python3.7/site-packages/paddleclas/deploy/python/preprocess.py:71: DeprecationWarning: LANCZOS is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.LANCZOS instead.
  'lanczos': Image.LANCZOS,
/usr/local/python3.7.0/lib/python3.7/site-packages/paddleclas/deploy/python/preprocess.py:72: DeprecationWarning: HAMMING is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.HAMMING instead.
  'hamming': Image.HAMMING,
/usr/local/python3.7.0/lib/python3.7/site-packages/paddleclas/deploy/python/preprocess.py:73: DeprecationWarning: BILINEAR is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BILINEAR instead.
  'random': (Image.BILINEAR, Image.BICUBIC)
/usr/local/python3.7.0/lib/python3.7/site-packages/paddleclas/deploy/python/preprocess.py:73: DeprecationWarning: BICUBIC is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BICUBIC instead.
  'random': (Image.BILINEAR, Image.BICUBIC)

虽然它能成功运行,但它的版本是Ubuntu 16.04,且python版本是3.7,要使用这个版本的话,只能通过多容器模式,实在太麻烦了。

使用官方的paddleclas镜像

docker run --gpus all -it --rm -v "/example:/example" --shm-size=8G --network=host paddlecloud/paddleclas:2.4-gpu-cuda11.2-cudnn8-latest /bin/bash

运行后,在容器内,直接测试

# Run the test.
cd /example
python test.py

这个也是能正常运行的。

是必须要退回到python 3.7版本吗

尝试退回到一个python 3.7的Debian纯净镜像,

docker run --gpus all -it --rm -v "/example:/example" --shm-size=8G --network=host python:3.7-slim bash

并重复之前的安装、测试步骤。发现测试也通过了。

2025-01-15 21:33:18 INFO: Loading faiss with AVX2 support.
2025-01-15 21:33:18 INFO: Could not load library with AVX2 support due to:
ModuleNotFoundError("No module named 'faiss.swigfaiss_avx2'")
2025-01-15 21:33:18 INFO: Loading faiss.
2025-01-15 21:33:18 INFO: Successfully loaded faiss.
[2025/01/15 21:33:18] ppcls INFO: download https://paddleclas.bj.bcebos.com/models/PULC/inference/text_image_orientation_infer.tar to /root/.paddleclas/inference_model/PULC/text_image_orientation/text_image_orientation_infer.tar
100%|██████████████████████████████████████████████████████████████████████████████| 7.40M/7.40M [00:19<00:00, 380kiB/s]
[2025/01/15 21:33:39] ppcls WARNING: The current running environment does not support the use of GPU. CPU has been used instead.

那么可以使用python 3.8版本吗

切换到python 3.8镜像,

docker run --gpus all -it --rm -v "/example:/example" --shm-size=8G --network=host python:3.8-slim bash

并重复之前的安装、测试步骤。测试不通过。

2025-01-15 21:38:40 INFO: Loading faiss with AVX512 support.
2025-01-15 21:38:40 INFO: Successfully loaded faiss with AVX512 support.
[2025/01/15 21:38:40] ppcls INFO: download https://paddleclas.bj.bcebos.com/models/PULC/inference/text_image_orientation_infer.tar to /root/.paddleclas/inference_model/PULC/text_image_orientation/text_image_orientation_infer.tar
100%|██████████████████████████████████████████████████████████████████████████████| 7.40M/7.40M [00:19<00:00, 383kiB/s]
[2025/01/15 21:39:02] ppcls WARNING: The current running environment does not support the use of GPU. CPU has been used instead.
Traceback (most recent call last):
  File "test-ori.py", line 3, in <module>
    model = paddleclas.PaddleClas(model_name="text_image_orientation")
  File "/usr/local/lib/python3.8/site-packages/paddleclas/paddleclas.py", line 610, in __init__
    self.predictor = ClsPredictor(self._config)
  File "/usr/local/lib/python3.8/site-packages/paddleclas/deploy/python/predict_cls.py", line 28, in __init__
    super().__init__(config["Global"])
  File "/usr/local/lib/python3.8/site-packages/paddleclas/deploy/utils/predictor.py", line 37, in __init__
    self.predictor, self.config = self.create_paddle_predictor(
  File "/usr/local/lib/python3.8/site-packages/paddleclas/deploy/utils/predictor.py", line 108, in create_paddle_predictor
    predictor = create_predictor(config)
MemoryError: std::bad_alloc

那么,问题出在paddlepaddle和paddleclas版本上吗

还是在python 3.8镜像,

docker run --gpus all -it --rm -v "/example:/example" --shm-size=8G --network=host python:3.8-slim bash

这一回,强制指定paddlepaddle和paddleclas版本为旧版

# Make the dependencies of OpenCV complete.
apt-get update
apt-get -y install libgomp1 libgl1-mesa-glx libglib2.0-0
# Install PaddleClas
pip install paddlepaddle==2.5.2 paddleclas==2.5.1
# Run the test.
cd /example
python test.py

运行成功:

2025-01-15 21:41:02 INFO: Loading faiss with AVX2 support.
2025-01-15 21:41:02 INFO: Could not load library with AVX2 support due to:
ModuleNotFoundError("No module named 'faiss.swigfaiss_avx2'")
2025-01-15 21:41:02 INFO: Loading faiss.
2025-01-15 21:41:02 INFO: Successfully loaded faiss.
[2025/01/15 21:41:02] ppcls WARNING: The current running environment does not support the use of GPU. CPU has been used instead.

结论

~~令人难以忍受的是,在各种标准环境下,PaddleClas都不能正常初始化,怀疑它的开发者所用的CPU是不是amd64的。~~

已经确认,paddlepaddle和paddleclas的版本之间存在兼容性问题。必须要指定合适的版本才行,不能太新也不能太旧。

本人后续在python 3.8的环境下,进行了进一步的确认:

  • paddlepaddle 2.5.2和paddleclas 2.5.1是可以兼容的。
  • paddlepaddle 2.5.2和paddleclas 2.6.0是可以兼容的。
  • paddlepaddle 2.6.0~2.6.2和paddleclas 2.6.0是不兼容的。会有MemoryError: std::bad_alloc
  • 最新的paddlepaddle 3.0.0rc0和paddleclas 2.6.0也是不兼容的。会有另外的错误。

本人所用的环境如下:

  • CUDA (if used): Cuda compilation tools, release 12.6, V12.6.77
  • OS (in container python:3.10-slim): Debian GNU/Linux 12 (bookworm) (Python is 3.10, PaddlePaddle is 2.6.2, PaddleClas is 2.6.0)
  • OS (in container nvidia/paddlepaddle): Ubuntu 22.04.5 LTS (Python is 3.10, PaddlePaddle is 2.6.1, PaddleClas is 2.6.0)
  • OS (in container paddlepaddle/paddle): Ubuntu 16.04.7 LTS (Python is 3.7, PaddlePaddle is 2.3.0, PaddleClas is 2.5.1)
  • OS (in container paddlecloud/paddleclas): Ubuntu 18.04.5 LTS (Python is 3.7, PaddlePaddle is 2.3.0.post112, PaddleClas is 0.0.0 (actually it should be 2.4, so this seems to be a dev version))
  • OS (in container python:3.7-slim): Debian GNU/Linux 12 (bookworm) (Python is 3.7, PaddlePaddle is 2.5.2, PaddleClas is 2.5.1)
  • OS (in container python:3.8-slim): Debian GNU/Linux 12 (bookworm) (Python is 3.8, PaddlePaddle is 2.6.2, PaddleClas is 2.6.0)
  • OS (native device): Windows 11 Enterprise 24H2 (10.0.26100 Build 26100)
  • Docker version: 27.3.1, build ce12230
  • NVIDIA Driver: 566.03

可以肯定的是,尽管上述测试多次报出memory error,但运行脚本的时候、本人的内存是绝对没有满的。

cainmagi avatar Jan 15 '25 20:01 cainmagi

感谢您的反馈和非常详细的实验!我们会安排排查该问题。

TingquanGao avatar Jan 20 '25 03:01 TingquanGao

赞!很详细的解决方案,我尝试把paddlepaddle版本回退,确实成功运行!

drawyaW avatar Feb 20 '25 01:02 drawyaW

可见paddle相关库在发布的时候,并没有自动化运行各种测试样例的机制。 而实际上这个并不难,把各种demo级别的命令集中到一起,运行一次就行。这很明显就是测试团队leader的责任

wang-kangkang avatar Feb 25 '25 07:02 wang-kangkang

感谢您的反馈和非常详细的实验!我们会安排排查该问题。

HI,请问大概何时可以兼容paddle2.6.2 版本?

bonyz avatar Mar 25 '25 09:03 bonyz

感谢您的反馈和非常详细的实验!我们会安排排查该问题。

HI,请问大概何时可以兼容paddle2.6.2 版本?

我是paddlepaddle-gpu2.6.2,paddleclas切到2.5.2后,[MemoryError: std::bad_alloc]报错消失了

GarageOfRick avatar Apr 02 '25 13:04 GarageOfRick

@cainmagi 为您点赞。这是我看到的最详细的验证方案。

csy19900206 avatar Aug 14 '25 02:08 csy19900206

paddlepaddle-gpu 2.6.2 + paddleclas 2.5.2 Demo可以执行,我是Ubuntu22.04 + Conda3 +Python 3.8.20 未使用docker, 硬件环境是笔记本3060 Laptop 6g可以执行。

csy19900206 avatar Aug 14 '25 02:08 csy19900206