mmcv
mmcv copied to clipboard
[Docker: Cannot Build TensorRT Plugin]- No CUDA runtime is found
Hi, I have search the issues and it seems that there are no solutions to get mmcv with tensorrrt plugin build in Docker.
Dockerfile:
FROM nvcr.io/nvidia/tensorrt:20.12-py3
. # all other dependencies installation
.
WORKDIR /src/mmcv
RUN MMCV_WITH_OPS=1 MMCV_WITH_TRT=1 pip install -e .
Without TRT, it works:
RUN MMCV_WITH_OPS=1 pip install -e .
The installation also works when I run the container which really confuses me....
My workflow now is to build the image without mmcv and then run the installation in the container to re-commit into a new image. It works but it will be better if I can build it directly during the Docker image build.
Any potential solutions?
Error:
#30 2.823 No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
#30 2.823 Compiling mmcv._ext without CUDA
#30 2.823 running develop
#30 2.823 running egg_info
#30 2.823 creating mmcv_full.egg-info
#30 2.823 writing mmcv_full.egg-info/PKG-INFO
#30 2.823 writing dependency_links to mmcv_full.egg-info/dependency_links.txt
#30 2.823 writing requirements to mmcv_full.egg-info/requires.txt
#30 2.823 writing top-level names to mmcv_full.egg-info/top_level.txt
#30 2.823 writing manifest file 'mmcv_full.egg-info/SOURCES.txt'
#30 2.823 reading manifest file 'mmcv_full.egg-info/SOURCES.txt'
#30 2.823 reading manifest template 'MANIFEST.in'
#30 2.823 writing manifest file 'mmcv_full.egg-info/SOURCES.txt'
#30 2.823 running build_ext
#30 2.823 building 'mmcv._ext_trt' extension
#30 2.823 creating /src/mmcv/build
#30 2.823 creating /src/mmcv/build/temp.linux-x86_64-3.8
#30 2.823 creating /src/mmcv/build/temp.linux-x86_64-3.8/mmcv
#30 2.823 creating /src/mmcv/build/temp.linux-x86_64-3.8/mmcv/ops
#30 2.823 creating /src/mmcv/build/temp.linux-x86_64-3.8/mmcv/ops/csrc
#30 2.823 creating /src/mmcv/build/temp.linux-x86_64-3.8/mmcv/ops/csrc/tensorrt
#30 2.823 creating /src/mmcv/build/temp.linux-x86_64-3.8/mmcv/ops/csrc/tensorrt/plugins
#30 2.823 Traceback (most recent call last):
#30 2.823 File "<string>", line 2, in <module>
#30 2.823 File "<pip-setuptools-caller>", line 34, in <module>
#30 2.823 File "/src/mmcv/setup.py", line 375, in <module>
#30 2.823 setup(
#30 2.823 File "/usr/lib/python3/dist-packages/setuptools/__init__.py", line 144, in setup
#30 2.823 return distutils.core.setup(**attrs)
#30 2.823 File "/usr/lib/python3.8/distutils/core.py", line 148, in setup
#30 2.823 dist.run_commands()
#30 2.823 File "/usr/lib/python3.8/distutils/dist.py", line 966, in run_commands
#30 2.823 self.run_command(cmd)
#30 2.823 File "/usr/lib/python3.8/distutils/dist.py", line 985, in run_command
#30 2.823 cmd_obj.run()
#30 2.823 File "/usr/lib/python3/dist-packages/setuptools/command/develop.py", line 38, in run
#30 2.823 self.install_for_development()
#30 2.823 File "/usr/lib/python3/dist-packages/setuptools/command/develop.py", line 140, in install_for_development
#30 2.823 self.run_command('build_ext')
#30 2.823 File "/usr/lib/python3.8/distutils/cmd.py", line 313, in run_command
#30 2.823 self.distribution.run_command(command)
#30 2.823 File "/usr/lib/python3.8/distutils/dist.py", line 985, in run_command
#30 2.823 cmd_obj.run()
#30 2.823 File "/usr/lib/python3/dist-packages/setuptools/command/build_ext.py", line 87, in run
#30 2.823 _build_ext.run(self)
#30 2.823 File "/usr/lib/python3.8/distutils/command/build_ext.py", line 340, in run
#30 2.823 self.build_extensions()
#30 2.823 File "/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py", line 708, in build_extensions
#30 2.823 build_ext.build_extensions(self)
#30 2.823 File "/usr/lib/python3.8/distutils/command/build_ext.py", line 449, in build_extensions
#30 2.823 self._build_extensions_serial()
#30 2.823 File "/usr/lib/python3.8/distutils/command/build_ext.py", line 474, in _build_extensions_serial
#30 2.823 self.build_extension(ext)
#30 2.823 File "/usr/lib/python3/dist-packages/setuptools/command/build_ext.py", line 208, in build_extension
#30 2.823 _build_ext.build_extension(self, ext)
#30 2.823 File "/usr/lib/python3.8/distutils/command/build_ext.py", line 528, in build_extension
#30 2.823 objects = self.compiler.compile(sources,
#30 2.823 File "/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py", line 524, in unix_wrap_ninja_compile
#30 2.823 cuda_post_cflags = unix_cuda_flags(cuda_post_cflags)
#30 2.823 File "/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py", line 423, in unix_cuda_flags
#30 2.823 cflags + _get_cuda_arch_flags(cflags))
#30 2.823 File "/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py", line 1561, in _get_cuda_arch_flags
#30 2.823 arch_list[-1] += '+PTX'
#30 2.823 IndexError: list index out of range
#30 2.823 [end of output]
#30 2.823
#30 2.823 note: This error originates from a subprocess, and is likely not a problem with pip.
Hi, @timothylimyl. Seems No CUDA runtime is found
during building MMCV with TensorRT. Maybe you can refer to mmdeploy and its Dockerfile
for some luck.
Hi @AllentDan , thanks, I will take a look to see whether does it solve my problem and report back.
Hi @timothylimyl , is there any progress?
Hi @zhouzaida @AllentDan , sorry for the late reply.
Looking over at mmdeploy, I cannot tell what will be causing the issue. I am also using the nvidia official container. Here is my Dockerfile:
FROM nvcr.io/nvidia/tensorrt:20.12-py3
ENV DEBIAN_FRONTEND=noninteractive
SHELL ["/bin/bash","-c"]
ENV FORCE_CUDA="1"
ENV mmcvVersion="v1.4.4"
# Install python libraries (includes torch==1.8.1+cu111, torchvision==0.9.1+cu111)
RUN mkdir /src
WORKDIR /src
ADD ./requirements.txt /src/
RUN pip install --upgrade pip
RUN pip install -r requirements.txt
# Installation other linux dependencies
RUN apt-get update && apt-get install -y python3-opencv \
git \
cmake \
build-essential \
curl \
wget \
gnupg2 \
lsb-release \
ca-certificates
RUN git clone https://github.com/open-mmlab/mmcv.git && \
cd mmcv && \
git checkout $mmcvVersion && \
echo "export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}" >> ~/.bashrc &&\
echo "export LD_LIBRARY_PATH=/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}" >> ~/.bashrc &&\
source ~/.bashrc &&\
MMCV_WITH_OPS=1 MMCV_WITH_TRT=1 pip install -e .
At the last line, if I change it to pip install -e .
then it works. The issue is that when building the container, Docker cannot seem to find the cuda path (that's why I tried adding the CUDA PATH). With that being said, if I build the image and run the container, it seems that I can then run MMCV_WITH_OPS=1 MMCV_WITH_TRT=1 pip install -e .
and everything works.
I tried an experiment where I run some PyTorch code, I can confirm that PyTorch is unable to find CUDA when building the container. However, everything works well when running the container (PyTorch + MMCV can find CUDA/TensorRT).
Edit: I RUN python3 -c "import torch;print(f'CUDA IS FOUND:{torch.cuda.is_available()}')"
to check during image building.
I will start looking into using MMDeploy into the pipeline, but I got a lot of other dependencies in my project and it may not play well with the official MMDeploy Dockerfile.
That seems to be the env variable should be specified in the Dockerfile. Try the methods from mmdet issue 281 and envs in mmdeploy dockerfile please.
Hi @AllentDan , my previous dockerfile already has those env ENV FORCE_CUDA="1"
and ENV DEBIAN_FRONTEND=noninteractive
Edit: add my requirements.txt for completeness:
--find-links https://download.pytorch.org/whl/torch_stable.html
torch==1.8.1+cu111
torchvision==0.9.1+cu111
numpy==1.21.4
onnx==1.10.2
pycuda==2021.1
Hi @AllentDan , my previous dockerfile already has those env
ENV FORCE_CUDA="1"
andENV DEBIAN_FRONTEND=noninteractive
Edit: add my requirements.txt for completeness:
--find-links https://download.pytorch.org/whl/torch_stable.html torch==1.8.1+cu111 torchvision==0.9.1+cu111 numpy==1.21.4 onnx==1.10.2 pycuda==2021.1
After digging into the docker image. It was found that the TENSORRT_DIR
has no targets or lib. Please try:
sed -i '143,145d' setup.py && sed -i '142 i\ \ \ \ \ \ \ \ tensorrt_lib_path = "/usr/lib/x86_64-linux-gnu/"' setup.py
to change the tensorrt_lib_path
before building mmcv with trt.
Not too sure what the sed
command error means:
=> ERROR [8/8] RUN git clone https://github.com/open-mmlab/mmcv.git && 1.8s
------
> [8/8] RUN git clone https://github.com/open-mmlab/mmcv.git &&
cd mmcv &&
git checkout v1.4.4 &&
sed -i '143,145d' setup.py && sed -i '142 i\ \ \ \ \ \ \ \ tensorrt_lib_path = "/usr/lib/x86_64-linux-gnu/"' setup.py
MMCV_WITH_OPS=1 MMCV_WITH_TRT=1 pip install -e .:
#12 0.246 Cloning into 'mmcv'...
#12 1.740 sed: -e expression #1, char 1: unknown command: `.'
------
executor failed running [/bin/bash -c git clone https://github.com/open-mmlab/mmcv.git && cd mmcv && sed -i '143,145d' setup.py && sed -i '142 i\ \ \ \ \ \ \ \ tensorrt_lib_path = "/usr/lib/x86_64-linux-gnu/"' setup.py MMCV_WITH_OPS=1 MMCV_WITH_TRT=1 pip install -e .]: exit code: 1
I think the main error is this: #12 1.740 sed: -e expression #1, char 1: unknown command:
.'`
I mean:
RUN sed -i '144,145d' setup.py && sed -i '142 i\ \ \ \ \ \ \ \ tensorrt_lib_path = "/usr/lib/x86_64-linux-gnu/"' setup.py
And make sure RUN python -c "import torch;print(torch.cuda.is_available())"
return true during docker building. If you find cuda is not available during building the docker image. Please try the methods here.