TensorRT-LLM
TensorRT-LLM copied to clipboard
build docker images from source is too large
i clone the latest of main branch, and build it successful, but it is too large, 357GB
build once again, image size become more larger: REPOSITORY TAG IMAGE ID CREATED SIZE tensorrt_llm/release latest cf5c89066392 59 minutes ago 513GB
reclone code and rebuilt, it works, I want to know the season in detail.
I take a try and cannot reproduce the issue (it takes 32GB on my side). Could you clean the related docker image caches and rebuild again on the main branch?
$ git log
commit f430a4b447ef4cba22698902d43eae0debf08594 (HEAD, origin/main)
Author: Kaiyu Xie <[email protected]>
Date: Tue May 28 20:07:49 2024 +0800
Update TensorRT-LLM (#1688)
* Update TensorRT-LLM
$ make -C docker release_build
make: Entering directory '/home/scratch.bhsueh_sw_1/workspace/TensorRT-LLM/tllm/docker'
Building docker image: tensorrt_llm/release:latest
DOCKER_BUILDKIT=1 docker build --pull \
--progress auto \
--build-arg BASE_IMAGE=nvcr.io/nvidia/pytorch \
--build-arg BASE_TAG=24.04-py3 \
--build-arg BUILD_WHEEL_ARGS="--clean --trt_root /usr/local/tensorrt --python_bindings --benchmarks" \
--build-arg TORCH_INSTALL_TYPE="skip" \
\
\
\
\
\
--build-arg TRT_LLM_VER="0.11.0.dev2024052800" \
\
--build-arg GIT_COMMIT="f430a4b447ef4cba22698902d43eae0debf08594" \
--target release \
--file Dockerfile.multi \
--tag tensorrt_llm/release:latest \
..
[+] Building 9235.5s (42/42) FINISHED docker:default
=> [internal] load .dockerignore 0.0s
=> => transferring context: 257B 0.0s
=> [internal] load build definition from Dockerfile.multi 0.0s
=> => transferring dockerfile: 3.49kB 0.0s
=> [internal] load metadata for nvcr.io/nvidia/pytorch:24.04-py3 0.1s
=> [internal] load build context 9.8s
=> => transferring context: 1.19GB 9.8s
=> [base 1/1] FROM nvcr.io/nvidia/pytorch:24.04-py3@sha256:a18861747b08d8314969e61420d77377a67cc5d394d515520cc3ca83cc7261a4 0.0s
=> CACHED [devel 1/16] COPY docker/common/install_base.sh install_base.sh 0.0s
=> CACHED [devel 2/16] RUN bash ./install_base.sh && rm install_base.sh 0.0s
=> CACHED [devel 3/16] COPY docker/common/install_cmake.sh install_cmake.sh 0.0s
=> CACHED [devel 4/16] RUN bash ./install_cmake.sh && rm install_cmake.sh 0.0s
=> CACHED [devel 5/16] COPY docker/common/install_ccache.sh install_ccache.sh 0.0s
=> CACHED [devel 6/16] RUN bash ./install_ccache.sh && rm install_ccache.sh 0.0s
=> CACHED [devel 7/16] COPY docker/common/install_cuda_toolkit.sh install_cuda_toolkit.sh 0.0s
=> CACHED [devel 8/16] RUN bash ./install_cuda_toolkit.sh && rm install_cuda_toolkit.sh 0.0s
=> CACHED [devel 9/16] COPY docker/common/install_tensorrt.sh install_tensorrt.sh 0.0s
=> CACHED [devel 10/16] RUN bash ./install_tensorrt.sh --TRT_VER=${TRT_VER} --CUDA_VER=${CUDA_VER} --CUDNN_VER=${CUDNN_VER} --NCCL_VER=${NCCL_VER} --CUBLAS_VER=${CUBLAS_VER} && rm install_ten 0.0s
=> CACHED [devel 11/16] COPY docker/common/install_polygraphy.sh install_polygraphy.sh 0.0s
=> CACHED [devel 12/16] RUN bash ./install_polygraphy.sh && rm install_polygraphy.sh 0.0s
=> CACHED [devel 13/16] COPY docker/common/install_mpi4py.sh install_mpi4py.sh 0.0s
=> CACHED [devel 14/16] RUN bash ./install_mpi4py.sh && rm install_mpi4py.sh 0.0s
=> CACHED [devel 15/16] COPY docker/common/install_pytorch.sh install_pytorch.sh 0.0s
=> CACHED [devel 16/16] RUN bash ./install_pytorch.sh skip && rm install_pytorch.sh 0.0s
=> [release 1/11] WORKDIR /app/tensorrt_llm 2.5s
=> [wheel 1/9] WORKDIR /src/tensorrt_llm 2.5s
=> [wheel 2/9] COPY benchmarks benchmarks 0.0s
=> [wheel 3/9] COPY cpp cpp 2.6s
=> [wheel 4/9] COPY benchmarks benchmarks 0.0s
=> [wheel 5/9] COPY scripts scripts 0.0s
=> [wheel 6/9] COPY tensorrt_llm tensorrt_llm 0.0s
=> [wheel 7/9] COPY 3rdparty 3rdparty 0.8s
=> [wheel 8/9] COPY setup.py requirements.txt requirements-dev.txt ./ 0.0s
=> [wheel 9/9] RUN python3 scripts/build_wheel.py --clean --trt_root /usr/local/tensorrt --python_bindings --benchmarks 9150.1s
=> [release 2/11] COPY --from=wheel /src/tensorrt_llm/build/tensorrt_llm*.whl . 4.3s
=> [release 3/11] RUN pip install tensorrt_llm*.whl --extra-index-url https://pypi.nvidia.com && rm tensorrt_llm*.whl 41.0s
=> [release 4/11] COPY README.md ./ 0.0s
=> [release 5/11] COPY docs docs 0.0s
=> [release 6/11] COPY cpp/include include 0.0s
=> [release 7/11] RUN ln -sv $(python3 -c 'import site; print(f"{site.getsitepackages()[0]}/tensorrt_llm/libs")') lib && test -f lib/libnvinfer_plugin_tensorrt_llm.so && ln -sv lib/libnvinfer_plugin_tensorr 0.5s
=> [release 8/11] COPY --from=wheel /src/tensorrt_llm/benchmarks benchmarks 0.0s
=> [release 9/11] COPY --from=wheel /src/tensorrt_llm/cpp/build/benchmarks/bertBenchmark /src/tensorrt_llm/cpp/build/benchmarks/gptManagerBenchmark /src/tensorrt_llm/cpp/build/benchmarks/gptSessionBe 0.0s
=> [release 10/11] COPY examples examples 0.2s
=> [release 11/11] RUN chmod -R a+w examples && rm -v benchmarks/cpp/bertBenchmark.cpp benchmarks/cpp/gptManagerBenchmark.cpp benchmarks/cpp/gptSessionBenchmark.cpp benchmarks/cpp/CMakeLi 0.5s
=> exporting to image 13.7s
=> => exporting layers 13.7s
=> => writing image sha256:442c248e3648a086c63c1372c0333290081f09f6d655563dcfdda80335e9916d 0.0s
=> => naming to docker.io/tensorrt_llm/release:latest 0.0s
make: Leaving directory '/home/scratch.bhsueh_sw_1/workspace/TensorRT-LLM/tllm/docker'
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
tensorrt_llm/release latest 442c248e3648 5 minutes ago 32GB
you can build with a previous version, and pull new commit ID and rebuild, can reproduce this issue
I still cannot reproduce your issue after trying again. Could you share your commit id?
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."
Issue has not received an update in over 14 days. Adding stale label.
This issue was closed because it has been 14 days without activity since it has been marked as stale.