官方样例demo报错: MemoryError: std::bad_alloc
直接按官方样例demo写的
- paddleClas版本 2.6.0 paddlepaddle版本2.6.2
- python 3.8
- paddle-cpu版本和gpu版本报相同的错误
python run.py
2025-01-03 18:29:51 INFO: Loading faiss with AVX512 support.
2025-01-03 18:29:51 INFO: Successfully loaded faiss with AVX512 support.
[2025/01/03 18:29:51] ppcls WARNING: The current running environment does not support the use of GPU. CPU has been used instead.
Traceback (most recent call last):
File "run.py", line 2, in
model = paddleclas.PaddleClas(model_name="person_attribute") File "venv/lib/python3.8/site-packages/paddleclas/paddleclas.py", line 610, in init self.predictor = ClsPredictor(self._config) File "venv/lib/python3.8/site-packages/paddleclas/deploy/python/predict_cls.py", line 28, in init super().init(config["Global"]) File "venv/lib/python3.8/site-packages/paddleclas/deploy/utils/predictor.py", line 37, in init self.predictor, self.config = self.create_paddle_predictor( File "venv/lib/python3.8/site-packages/paddleclas/deploy/utils/predictor.py", line 108, in create_paddle_predictor predictor = create_predictor(config) MemoryError: std::bad_alloc
请提供一下完整的启动命令
@TingquanGao
我来提供一个复现这个问题的方法:
复现过程
首先,将以下测试脚本test.py保存在某处,例如/example/test.py
import paddleclas
model = paddleclas.PaddleClas(model_name="text_image_orientation")
然后,运行以下命令启动docker容器,该容器是一个纯净的Debian Python容器。
docker run --gpus all -it --rm --shm-size=1g -v "/example:/example" python:3.10-slim bash
在容器中、安装必要的依赖项,并运行测试脚本。这里安装的是CPU版本的依赖项,因为目前Paddle的GPU版本和本人使用的其他库不兼容。
# Make the dependencies of OpenCV complete.
apt-get update
apt-get -y install libgomp1 libgl1-mesa-glx libglib2.0-0
# Install PaddleClas
pip install paddlepaddle paddleclas
# Run the test.
cd /example
python test.py
报错结果
于是,得到以下报错:
[2025/01/15 20:08:51] ppcls INFO: download https://paddleclas.bj.bcebos.com/models/PULC/inference/text_image_orientation_infer.tar to /root/.paddleclas/inference_model/PULC/text_image_orientation/text_image_orientation_infer.tar
100%|████████████████████████████████████████████████████████████████████████████| 7.40M/7.40M [00:18<00:00, 402kiB/s]
[2025/01/15 20:09:12] ppcls WARNING: The current running environment does not support the use of GPU. CPU has been used instead.
Traceback (most recent call last):
File "/example/test.py", line 3, in <module>
model = paddleclas.PaddleClas(model_name="text_image_orientation")
File "/usr/local/lib/python3.10/site-packages/paddleclas/paddleclas.py", line 610, in __init__
self.predictor = ClsPredictor(self._config)
File "/usr/local/lib/python3.10/site-packages/paddleclas/deploy/python/predict_cls.py", line 28, in __init__
super().__init__(config["Global"])
File "/usr/local/lib/python3.10/site-packages/paddleclas/deploy/utils/predictor.py", line 37, in __init__
self.predictor, self.config = self.create_paddle_predictor(
File "/usr/local/lib/python3.10/site-packages/paddleclas/deploy/utils/predictor.py", line 108, in create_paddle_predictor
predictor = create_predictor(config)
MemoryError: std::bad_alloc
深入测试
是shared memory的问题吗?
有没有可能,是shared memory太小?我尝试将container的启动参数改为:
docker run --gpus all -it --rm --shm-size=16g -v "/example:/example" python:3.10-slim bash
这个大小已经比教程里还大了。
然而错误如故。
是因为使用了GPU映射的关系吗?
尝试将--gpus all参数去除,
docker run -it --rm --shm-size=1g -v "/example:/example" python:3.10-slim bash
然而错误如故。
是paddlepaddle的安装有问题、或是因为paddlepaddle的CPU版本不可用吗?
我刚好知道NVIDIA有一个paddlepaddle的镜像,透过尝试运行它,并重复上述测试,
docker run --gpus all -it --rm --shm-size=1g -v "/example:/example" nvcr.io/nvidia/paddlepaddle:24.10-py3
为什么要用24.10而不是更新的版本(例如24.12)?这是因为,paddleclas不支持python 3.12,而镜像版本24.10是最后一个使用Ubuntu 22.04和Python 3.10的版本。注意它的CUDA仍然是几乎最新的12.6,且其已经内置了paddlepaddle-gpu的版本。
注意由于镜像里已经有了GPU版本的paddlepaddle-gpu,安装过程需要修改一下:
# Make the dependencies of OpenCV complete.
# Note that this is an Ubuntu image.
apt-get update
apt-get -y install libgomp1 libegl1 libglu1-mesa-dev
# Do not need to install paddlepaddle because paddlepaddle-gpu already exists.
pip install paddleclas
# Run the test.
cd /example
python test.py
这回依然报错,但是报错的内容不一样了
[2025/01/15 20:46:49] ppcls INFO: download https://paddleclas.bj.bcebos.com/models/PULC/inference/text_image_orientation_infer.tar to /root/.paddleclas/inference_model/PULC/text_image_orientation/text_image_orientation_infer.tar
100%|██████████████████████████████████████████████████████████████████████████████| 7.40M/7.40M [00:18<00:00, 406kiB/s]
Traceback (most recent call last):
File "/example/test.py", line 3, in <module>
model = paddleclas.PaddleClas(model_name="text_image_orientation")
File "/usr/local/lib/python3.10/dist-packages/paddleclas/paddleclas.py", line 610, in __init__
self.predictor = ClsPredictor(self._config)
File "/usr/local/lib/python3.10/dist-packages/paddleclas/deploy/python/predict_cls.py", line 28, in __init__
super().__init__(config["Global"])
File "/usr/local/lib/python3.10/dist-packages/paddleclas/deploy/utils/predictor.py", line 37, in __init__
self.predictor, self.config = self.create_paddle_predictor(
File "/usr/local/lib/python3.10/dist-packages/paddleclas/deploy/utils/predictor.py", line 108, in create_paddle_predictor
predictor = create_predictor(config)
ValueError: basic_string::_M_replace_aux
使用官方的paddle镜像
严格按照教程,使用官方的paddle镜像。
docker run --gpus all --name ppcls -it --rm -v "/example:/example" --shm-size=8G --network=host paddlepaddle/paddle:2.3.0-gpu-cuda10.2-cudnn7 /bin/bash
运行后,在容器内,安装并测试
# Install dependencies. Do not need to fix OpenCV issues.
pip install paddleclas
# Run the test.
cd /example
python test.py
成功运行起来了。
grep: warning: GREP_OPTIONS is deprecated; please use an alias or script
2025-01-15 21:16:35 INFO: Loading faiss with AVX2 support.
2025-01-15 21:16:35 INFO: Could not load library with AVX2 support due to:
ModuleNotFoundError("No module named 'faiss.swigfaiss_avx2'")
2025-01-15 21:16:35 INFO: Loading faiss.
2025-01-15 21:16:35 INFO: Successfully loaded faiss.
[2025/01/15 21:16:35] ppcls INFO: download https://paddleclas.bj.bcebos.com/models/PULC/inference/text_image_orientation_infer.tar to /root/.paddleclas/inference_model/PULC/text_image_orientation/text_image_orientation_infer.tar
100%|██████████████████████████████████████████████████████████████████████████████| 7.40M/7.40M [00:18<00:00, 410kiB/s]
/usr/local/python3.7.0/lib/python3.7/site-packages/paddleclas/deploy/python/preprocess.py:67: DeprecationWarning: NEAREST is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.NEAREST or Dither.NONE instead.
'nearest': Image.NEAREST,
/usr/local/python3.7.0/lib/python3.7/site-packages/paddleclas/deploy/python/preprocess.py:68: DeprecationWarning: BILINEAR is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BILINEAR instead.
'bilinear': Image.BILINEAR,
/usr/local/python3.7.0/lib/python3.7/site-packages/paddleclas/deploy/python/preprocess.py:69: DeprecationWarning: BICUBIC is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BICUBIC instead.
'bicubic': Image.BICUBIC,
/usr/local/python3.7.0/lib/python3.7/site-packages/paddleclas/deploy/python/preprocess.py:70: DeprecationWarning: BOX is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BOX instead.
'box': Image.BOX,
/usr/local/python3.7.0/lib/python3.7/site-packages/paddleclas/deploy/python/preprocess.py:71: DeprecationWarning: LANCZOS is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.LANCZOS instead.
'lanczos': Image.LANCZOS,
/usr/local/python3.7.0/lib/python3.7/site-packages/paddleclas/deploy/python/preprocess.py:72: DeprecationWarning: HAMMING is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.HAMMING instead.
'hamming': Image.HAMMING,
/usr/local/python3.7.0/lib/python3.7/site-packages/paddleclas/deploy/python/preprocess.py:73: DeprecationWarning: BILINEAR is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BILINEAR instead.
'random': (Image.BILINEAR, Image.BICUBIC)
/usr/local/python3.7.0/lib/python3.7/site-packages/paddleclas/deploy/python/preprocess.py:73: DeprecationWarning: BICUBIC is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BICUBIC instead.
'random': (Image.BILINEAR, Image.BICUBIC)
虽然它能成功运行,但它的版本是Ubuntu 16.04,且python版本是3.7,要使用这个版本的话,只能通过多容器模式,实在太麻烦了。
使用官方的paddleclas镜像
docker run --gpus all -it --rm -v "/example:/example" --shm-size=8G --network=host paddlecloud/paddleclas:2.4-gpu-cuda11.2-cudnn8-latest /bin/bash
运行后,在容器内,直接测试
# Run the test.
cd /example
python test.py
这个也是能正常运行的。
是必须要退回到python 3.7版本吗
尝试退回到一个python 3.7的Debian纯净镜像,
docker run --gpus all -it --rm -v "/example:/example" --shm-size=8G --network=host python:3.7-slim bash
并重复之前的安装、测试步骤。发现测试也通过了。
2025-01-15 21:33:18 INFO: Loading faiss with AVX2 support.
2025-01-15 21:33:18 INFO: Could not load library with AVX2 support due to:
ModuleNotFoundError("No module named 'faiss.swigfaiss_avx2'")
2025-01-15 21:33:18 INFO: Loading faiss.
2025-01-15 21:33:18 INFO: Successfully loaded faiss.
[2025/01/15 21:33:18] ppcls INFO: download https://paddleclas.bj.bcebos.com/models/PULC/inference/text_image_orientation_infer.tar to /root/.paddleclas/inference_model/PULC/text_image_orientation/text_image_orientation_infer.tar
100%|██████████████████████████████████████████████████████████████████████████████| 7.40M/7.40M [00:19<00:00, 380kiB/s]
[2025/01/15 21:33:39] ppcls WARNING: The current running environment does not support the use of GPU. CPU has been used instead.
那么可以使用python 3.8版本吗
切换到python 3.8镜像,
docker run --gpus all -it --rm -v "/example:/example" --shm-size=8G --network=host python:3.8-slim bash
并重复之前的安装、测试步骤。测试不通过。
2025-01-15 21:38:40 INFO: Loading faiss with AVX512 support.
2025-01-15 21:38:40 INFO: Successfully loaded faiss with AVX512 support.
[2025/01/15 21:38:40] ppcls INFO: download https://paddleclas.bj.bcebos.com/models/PULC/inference/text_image_orientation_infer.tar to /root/.paddleclas/inference_model/PULC/text_image_orientation/text_image_orientation_infer.tar
100%|██████████████████████████████████████████████████████████████████████████████| 7.40M/7.40M [00:19<00:00, 383kiB/s]
[2025/01/15 21:39:02] ppcls WARNING: The current running environment does not support the use of GPU. CPU has been used instead.
Traceback (most recent call last):
File "test-ori.py", line 3, in <module>
model = paddleclas.PaddleClas(model_name="text_image_orientation")
File "/usr/local/lib/python3.8/site-packages/paddleclas/paddleclas.py", line 610, in __init__
self.predictor = ClsPredictor(self._config)
File "/usr/local/lib/python3.8/site-packages/paddleclas/deploy/python/predict_cls.py", line 28, in __init__
super().__init__(config["Global"])
File "/usr/local/lib/python3.8/site-packages/paddleclas/deploy/utils/predictor.py", line 37, in __init__
self.predictor, self.config = self.create_paddle_predictor(
File "/usr/local/lib/python3.8/site-packages/paddleclas/deploy/utils/predictor.py", line 108, in create_paddle_predictor
predictor = create_predictor(config)
MemoryError: std::bad_alloc
那么,问题出在paddlepaddle和paddleclas版本上吗
还是在python 3.8镜像,
docker run --gpus all -it --rm -v "/example:/example" --shm-size=8G --network=host python:3.8-slim bash
这一回,强制指定paddlepaddle和paddleclas版本为旧版
# Make the dependencies of OpenCV complete.
apt-get update
apt-get -y install libgomp1 libgl1-mesa-glx libglib2.0-0
# Install PaddleClas
pip install paddlepaddle==2.5.2 paddleclas==2.5.1
# Run the test.
cd /example
python test.py
运行成功:
2025-01-15 21:41:02 INFO: Loading faiss with AVX2 support.
2025-01-15 21:41:02 INFO: Could not load library with AVX2 support due to:
ModuleNotFoundError("No module named 'faiss.swigfaiss_avx2'")
2025-01-15 21:41:02 INFO: Loading faiss.
2025-01-15 21:41:02 INFO: Successfully loaded faiss.
[2025/01/15 21:41:02] ppcls WARNING: The current running environment does not support the use of GPU. CPU has been used instead.
结论
~~令人难以忍受的是,在各种标准环境下,PaddleClas都不能正常初始化,怀疑它的开发者所用的CPU是不是amd64的。~~
已经确认,paddlepaddle和paddleclas的版本之间存在兼容性问题。必须要指定合适的版本才行,不能太新也不能太旧。
本人后续在python 3.8的环境下,进行了进一步的确认:
- paddlepaddle 2.5.2和paddleclas 2.5.1是可以兼容的。
- paddlepaddle 2.5.2和paddleclas 2.6.0是可以兼容的。
- paddlepaddle 2.6.0~2.6.2和paddleclas 2.6.0是不兼容的。会有
MemoryError: std::bad_alloc - 最新的paddlepaddle 3.0.0rc0和paddleclas 2.6.0也是不兼容的。会有另外的错误。
本人所用的环境如下:
- CUDA (if used): Cuda compilation tools, release 12.6, V12.6.77
- OS (in container
python:3.10-slim): Debian GNU/Linux 12 (bookworm) (Python is3.10, PaddlePaddle is 2.6.2, PaddleClas is 2.6.0) - OS (in container
nvidia/paddlepaddle): Ubuntu 22.04.5 LTS (Python is3.10, PaddlePaddle is 2.6.1, PaddleClas is 2.6.0) - OS (in container
paddlepaddle/paddle): Ubuntu 16.04.7 LTS (Python is3.7, PaddlePaddle is 2.3.0, PaddleClas is 2.5.1) - OS (in container
paddlecloud/paddleclas): Ubuntu 18.04.5 LTS (Python is3.7, PaddlePaddle is 2.3.0.post112, PaddleClas is 0.0.0 (actually it should be 2.4, so this seems to be a dev version)) - OS (in container
python:3.7-slim): Debian GNU/Linux 12 (bookworm) (Python is3.7, PaddlePaddle is 2.5.2, PaddleClas is 2.5.1) - OS (in container
python:3.8-slim): Debian GNU/Linux 12 (bookworm) (Python is3.8, PaddlePaddle is 2.6.2, PaddleClas is 2.6.0) - OS (native device): Windows 11 Enterprise 24H2 (
10.0.26100 Build 26100) - Docker version:
27.3.1, build ce12230 - NVIDIA Driver:
566.03
可以肯定的是,尽管上述测试多次报出memory error,但运行脚本的时候、本人的内存是绝对没有满的。
感谢您的反馈和非常详细的实验!我们会安排排查该问题。
赞!很详细的解决方案,我尝试把paddlepaddle版本回退,确实成功运行!
可见paddle相关库在发布的时候,并没有自动化运行各种测试样例的机制。 而实际上这个并不难,把各种demo级别的命令集中到一起,运行一次就行。这很明显就是测试团队leader的责任
感谢您的反馈和非常详细的实验!我们会安排排查该问题。
HI,请问大概何时可以兼容paddle2.6.2 版本?
感谢您的反馈和非常详细的实验!我们会安排排查该问题。
HI,请问大概何时可以兼容paddle2.6.2 版本?
我是paddlepaddle-gpu2.6.2,paddleclas切到2.5.2后,[MemoryError: std::bad_alloc]报错消失了
@cainmagi 为您点赞。这是我看到的最详细的验证方案。
paddlepaddle-gpu 2.6.2 + paddleclas 2.5.2 Demo可以执行,我是Ubuntu22.04 + Conda3 +Python 3.8.20 未使用docker, 硬件环境是笔记本3060 Laptop 6g可以执行。