🐛 [Bug] Docker build fails
Bug Description
Sending build context to Docker daemon 29.57MB
Step 1/34 : ARG BASE=22.01
Step 2/34 : ARG BASE_IMG=nvcr.io/nvidia/pytorch:${BASE}-py3
Step 3/34 : FROM ${BASE_IMG} as base
---> 3f290bb26216
Step 4/34 : FROM base as torch-tensorrt-builder-base
---> 3f290bb26216
Step 5/34 : RUN rm -rf /opt/pytorch/torch_tensorrt /usr/bin/bazel
---> Using cache
---> 6a5e47ade331
Step 6/34 : ARG ARCH="x86_64"
---> Using cache
---> 94acf4d3746c
Step 7/34 : ARG TARGETARCH="amd64"
---> Using cache
---> 6e159a1f5635
Step 8/34 : ARG BAZEL_VERSION=4.2.1
---> Using cache
---> 981ce2761707
Step 9/34 : RUN [[ "$TARGETARCH" == "amd64" ]] && ARCH="x86_64" || ARCH="${TARGETARCH}" && wget -q https://github.com/bazelbuild/bazel/releases/download/${BAZEL_VERSION}/bazel-${BAZEL_VERSION}-linux-${ARCH} -O /usr/bin/bazel && chmod a+x /usr/bin/bazel
---> Using cache
---> 3387313ebde0
Step 10/34 : RUN touch /usr/lib/$HOSTTYPE-linux-gnu/libnvinfer_static.a
---> Using cache
---> 2be1d77d6422
Step 11/34 : RUN rm -rf /usr/local/cuda/lib* /usr/local/cuda/include && ln -sf /usr/local/cuda/targets/$HOSTTYPE-linux/lib /usr/local/cuda/lib64 && ln -sf /usr/local/cuda/targets/$HOSTTYPE-linux/include /usr/local/cuda/include
---> Using cache
---> 97185e01fe2f
Step 12/34 : RUN apt-get update && apt-get install -y --no-install-recommends locales ninja-build && rm -rf /var/lib/apt/lists/* && locale-gen en_US.UTF-8
---> Using cache
---> 85a8c0dd0f06
Step 13/34 : FROM torch-tensorrt-builder-base as torch-tensorrt-builder
---> 85a8c0dd0f06
Step 14/34 : RUN rm -rf /opt/pytorch/torch_tensorrt
---> Using cache
---> 05475178cb1e
Step 15/34 : COPY . /workspace/torch_tensorrt/src
---> Using cache
---> 5c2b45977e72
Step 16/34 : WORKDIR /workspace/torch_tensorrt/src
---> Using cache
---> 080d93900192
Step 17/34 : RUN cp ./docker/WORKSPACE.docker WORKSPACE
---> Using cache
---> f2a6f1878089
Step 18/34 : RUN ./docker/dist-build.sh
---> Running in fb5b50c7fca3
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
running bdist_wheel
Extracting Bazel installation...
Starting local Bazel server and connecting to it...
Loading:
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
Analyzing: target //:libtorchtrt (1 packages loaded, 0 targets configured)
INFO: Analyzed target //:libtorchtrt (43 packages loaded, 3021 targets configured).
INFO: Found 1 target...
[0 / 81] [Prepa] Writing file cpp/lib/libtorchtrt.so-2.params ... (2 actions, 0 running)
ERROR: /workspace/torch_tensorrt/src/core/lowering/passes/BUILD:10:11: Compiling core/lowering/passes/linear_to_addmm.cpp failed: (Exit 1): gcc failed: error executing command /usr/bin/gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections ... (remaining 61 argument(s) skipped)
Use --sandbox_debug to see verbose messages from the sandbox
core/lowering/passes/linear_to_addmm.cpp: In function 'void torch_tensorrt::core::lowering::passes::replaceLinearWithBiasNonePattern(std::shared_ptr<torch::jit::Graph>)':
core/lowering/passes/linear_to_addmm.cpp:37:93: error: 'struct torch::jit::Function' has no member named 'graph'
37 | std::shared_ptr<torch::jit::Graph> d_graph = decompose_funcs.get_function("linear").graph();
| ^~~~~
Target //:libtorchtrt failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 166.072s, Critical Path: 17.92s
INFO: 1243 processes: 1196 internal, 47 processwrapper-sandbox.
FAILED: Build did NOT complete successfully
FAILED: Build did NOT complete successfully
using CXX11 ABI build
building libtorchtrt
The command '/bin/sh -c ./docker/dist-build.sh' returned a non-zero code: 1
To Reproduce
Steps to reproduce the behavior:
git clone https://github.com/NVIDIA/Torch-TensorRTcd Torch-TensorRTdocker build --build-arg BASE=22.01 -f docker/Dockerfile -t torch_tensorrt:latest .
Environment
Build information about Torch-TensorRT can be found by turning on debug messages
- Torch-TensorRT Version (e.g. 1.0.0):
- PyTorch Version (e.g. 1.0):
- CPU Architecture:
- OS (e.g., Linux):
- How you installed PyTorch (
conda,pip,libtorch, source): - Build command you used (if compiling from source):
- Are you using local sources or building from archives:
- Python version:
- CUDA version:
- GPU models and configuration:
- Any other relevant information:
Additional context
@Biaocsu: Which branch are you using? Please use release/ngc/22.01 branch if you want to build the source inside the NGC container.
@Biaocsu: Which branch are you using? Please use release/ngc/22.01 branch if you want to build the source inside the NGC container.
I used the latest master branch. OK, I will use "release/ngc/22.01" branch
Experiencing the same issue here. checked out release/ngc/22.01 and it's working. Thanks
@andi4191 can this issue be solved? as it has been so long. I need to pull the master branch for many issue solved where other branch do not support. otherwise, it will be really a big trouble please, help solve and verify this problem
Hi @chophilip21 @Biaocsu,
It seems the error is related to this API change updated here in NGC 22.02 release. https://github.com/NVIDIA/Torch-TensorRT/commit/449723ef014fc839cbed5fda0824419a902bd403#
However, I am able to build release/ngc/22.01 branch with BASE=22.01. Can you please check if your are using the correct checked out version?
To summarize, I haven't been able to reproduce this issue.
This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days