PaddleX
PaddleX copied to clipboard
关于docker中使用PaddleX的问题
描述问题
我希望自定义一个docker镜像,当我使用以下dockerfile进行构建,执行到RUN paddlex --install报错
File name: Dockerfile
FROM rayproject/ray-ml:2.30.0-py310-gpu
# 设置 pip 使用清华源
ENV PIP_INDEX_URL=https://pypi.tuna.tsinghua.edu.cn/simple
ENV VLLM_USE_MODELSCOPE=True
# 设置 conda 使用清华源
RUN conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/ \
&& conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/ \
&& conda config --set show_channel_urls yes
# 安装paddlegpu 和 paddleX
RUN python -m pip install paddlepaddle-gpu==3.0.0b1 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/
COPY ./pdpd/PaddleX /init/PaddleX
RUN pip install -e /init/PaddleX
RUN paddlex --install
# -----------------------------Base环境END-------------------------------------------------#
USER root
# 安装ssl证书
RUN apt-get update && apt-get install -y ca-certificates && update-ca-certificates
# 配置为默认证书
ENV REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt
# 复制证书
COPY minio.crt /usr/local/share/ca-certificates/minio.crt
# 更新证书文件
RUN update-ca-certificates
USER ray
错误信息:
=> ERROR [18/21] RUN paddlex --install 0.4s
------
> [18/21] RUN paddlex --install:
0.392 Error: Can not import paddle core while this file exists: /home/ray/anaconda3/lib/python3.10/site-packages/paddle/base/libpaddle.so
0.412 Traceback (most recent call last):
0.412 File "/home/ray/anaconda3/bin/paddlex", line 33, in <module>
0.412 sys.exit(load_entry_point('paddlex', 'console_scripts', 'paddlex')())
0.412 File "/home/ray/anaconda3/bin/paddlex", line 25, in importlib_load_entry_point
0.412 return next(matches).load()
0.412 File "/home/ray/anaconda3/lib/python3.10/importlib/metadata/__init__.py", line 171, in load
0.412 module = import_module(match.group('module'))
0.412 File "/home/ray/anaconda3/lib/python3.10/importlib/__init__.py", line 126, in import_module
0.412 return _bootstrap._gcd_import(name[level:], package, level)
0.412 File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
0.412 File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
0.412 File "<frozen importlib._bootstrap>", line 992, in _find_and_load_unlocked
0.412 File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
0.412 File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
0.412 File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
0.412 File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
0.412 File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
0.412 File "<frozen importlib._bootstrap_external>", line 883, in exec_module
0.412 File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
0.412 File "/init/PaddleX/paddlex/__init__.py", line 20, in <module>
0.412 from .modules import build_dataset_checker, build_trainer, build_evaluater, build_predictor
0.412 File "/init/PaddleX/paddlex/modules/__init__.py", line 16, in <module>
0.412 from .base import build_dataset_checker, build_trainer, build_evaluater, build_predictor, create_model, \
0.412 File "/init/PaddleX/paddlex/modules/base/__init__.py", line 18, in <module>
0.413 from .trainer import build_trainer, BaseTrainer, BaseTrainDeamon
0.413 File "/init/PaddleX/paddlex/modules/base/trainer/__init__.py", line 17, in <module>
0.413 from .trainer import build_trainer, BaseTrainer
0.413 File "/init/PaddleX/paddlex/modules/base/trainer/trainer.py", line 19, in <module>
0.413 from ..build_model import build_model
0.413 File "/init/PaddleX/paddlex/modules/base/build_model.py", line 18, in <module>
0.413 from ...utils.device import get_device
0.413 File "/init/PaddleX/paddlex/utils/device.py", line 16, in <module>
0.413 import paddle
0.413 File "/home/ray/anaconda3/lib/python3.10/site-packages/paddle/__init__.py", line 33, in <module>
0.413 from .base import core # noqa: F401
0.413 File "/home/ray/anaconda3/lib/python3.10/site-packages/paddle/base/__init__.py", line 38, in <module>
0.413 from . import ( # noqa: F401
0.413 File "/home/ray/anaconda3/lib/python3.10/site-packages/paddle/base/backward.py", line 25, in <module>
0.413 from . import core, framework, log_helper, unique_name
0.413 File "/home/ray/anaconda3/lib/python3.10/site-packages/paddle/base/core.py", line 384, in <module>
0.413 raise e
0.413 File "/home/ray/anaconda3/lib/python3.10/site-packages/paddle/base/core.py", line 267, in <module>
0.413 from . import libpaddle
0.413 ImportError: libcuda.so.1: cannot open shared object file: No such file or directory
------
Dockerfile:87
--------------------
85 | COPY ./pdpd/PaddleX /init/PaddleX
86 | RUN pip install -e /init/PaddleX
87 | >>> RUN paddlex --install
88 |
89 | # -----------------------------Base环境END-------------------------------------------------#
--------------------
ERROR: failed to solve: process "/bin/bash -c paddlex --install" did not complete successfully: exit code: 1
我尝试在dockerfile加入paddle-gpu的验证,发现同样是找不到 libcuda.so.1的错误 之后 我尝试手动执行 paddlex --install 于是有以下过程,我发现手动在容器内执行
root@xdzl-4090:/dev/data_16T/project/xd_ai/portal/images# docker run --rm -it rayproject/ray-ml:2.30.0-py310-gpu bash
==========
== CUDA ==
==========
CUDA Version 11.8.0
Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.
(base) ray@eed194f1a782:~$ RUN python -m pip install paddlepaddle-gpu==3.0.0b1 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/
bash: RUN: command not found
(base) ray@eed194f1a782:~$ python -m pip install paddlepaddle-gpu==3.0.0b1 -i https://www.paddlepaddle.org.cn/packages/stable/c
u118/(base) ray@eed194f1a782:~$ python -c "import paddle; paddle.utils.run_check()"
Running verify PaddlePaddle program ...
I0926 00:56:08.805682 55 program_interpreter.cc:243] New Executor is Running.
W0926 00:56:08.807749 55 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 8.9, Driver API Version: 12.5, Runtime API Version: 11.8
W0926 00:56:08.808048 55 gpu_resources.cc:164] device: 0, cuDNN Version: 8.7.
I0926 00:56:08.961114 55 interpreter_util.cc:648] Standalone Executor is Used.
PaddlePaddle works well on 1 GPU.
PaddlePaddle is installed successfully! Let's start deep learning with PaddlePaddle now.
我怀疑可能是在docker构建镜像时没有gpu,而在容器内是有gpu导致的,但install这个操作应该没有必要使用gpu吧 有好的解决方案吗