TensorRT-LLM icon indicating copy to clipboard operation
TensorRT-LLM copied to clipboard

build docker images from source is too large

Open Fred-cell opened this issue 1 year ago • 6 comments

image

i clone the latest of main branch, and build it successful, but it is too large, 357GB

Fred-cell avatar May 30 '24 15:05 Fred-cell

build once again, image size become more larger: REPOSITORY TAG IMAGE ID CREATED SIZE tensorrt_llm/release latest cf5c89066392 59 minutes ago 513GB

Fred-cell avatar May 31 '24 02:05 Fred-cell

reclone code and rebuilt, it works, I want to know the season in detail.

Fred-cell avatar May 31 '24 05:05 Fred-cell

I take a try and cannot reproduce the issue (it takes 32GB on my side). Could you clean the related docker image caches and rebuild again on the main branch?

$ git log

commit f430a4b447ef4cba22698902d43eae0debf08594 (HEAD, origin/main)
Author: Kaiyu Xie <[email protected]>
Date:   Tue May 28 20:07:49 2024 +0800

    Update TensorRT-LLM (#1688)
    
    * Update TensorRT-LLM
$ make -C docker release_build
make: Entering directory '/home/scratch.bhsueh_sw_1/workspace/TensorRT-LLM/tllm/docker'
Building docker image: tensorrt_llm/release:latest
DOCKER_BUILDKIT=1 docker build --pull  \
        --progress auto \
         --build-arg BASE_IMAGE=nvcr.io/nvidia/pytorch \
         --build-arg BASE_TAG=24.04-py3 \
         --build-arg BUILD_WHEEL_ARGS="--clean --trt_root /usr/local/tensorrt --python_bindings --benchmarks" \
         --build-arg TORCH_INSTALL_TYPE="skip" \
         \
         \
         \
         \
         \
         --build-arg TRT_LLM_VER="0.11.0.dev2024052800" \
         \
         --build-arg GIT_COMMIT="f430a4b447ef4cba22698902d43eae0debf08594" \
         --target release \
        --file Dockerfile.multi \
        --tag tensorrt_llm/release:latest \
        ..
[+] Building 9235.5s (42/42) FINISHED                                                                                                                                                                           docker:default
 => [internal] load .dockerignore                                                                                                                                                                                         0.0s
 => => transferring context: 257B                                                                                                                                                                                         0.0s
 => [internal] load build definition from Dockerfile.multi                                                                                                                                                                0.0s
 => => transferring dockerfile: 3.49kB                                                                                                                                                                                    0.0s
 => [internal] load metadata for nvcr.io/nvidia/pytorch:24.04-py3                                                                                                                                                         0.1s
 => [internal] load build context                                                                                                                                                                                         9.8s
 => => transferring context: 1.19GB                                                                                                                                                                                       9.8s
 => [base 1/1] FROM nvcr.io/nvidia/pytorch:24.04-py3@sha256:a18861747b08d8314969e61420d77377a67cc5d394d515520cc3ca83cc7261a4                                                                                              0.0s
 => CACHED [devel  1/16] COPY docker/common/install_base.sh install_base.sh                                                                                                                                               0.0s
 => CACHED [devel  2/16] RUN bash ./install_base.sh && rm install_base.sh                                                                                                                                                 0.0s
 => CACHED [devel  3/16] COPY docker/common/install_cmake.sh install_cmake.sh                                                                                                                                             0.0s
 => CACHED [devel  4/16] RUN bash ./install_cmake.sh && rm install_cmake.sh                                                                                                                                               0.0s
 => CACHED [devel  5/16] COPY docker/common/install_ccache.sh install_ccache.sh                                                                                                                                           0.0s
 => CACHED [devel  6/16] RUN bash ./install_ccache.sh && rm install_ccache.sh                                                                                                                                             0.0s
 => CACHED [devel  7/16] COPY docker/common/install_cuda_toolkit.sh install_cuda_toolkit.sh                                                                                                                               0.0s
 => CACHED [devel  8/16] RUN bash ./install_cuda_toolkit.sh && rm install_cuda_toolkit.sh                                                                                                                                 0.0s
 => CACHED [devel  9/16] COPY docker/common/install_tensorrt.sh install_tensorrt.sh                                                                                                                                       0.0s
 => CACHED [devel 10/16] RUN bash ./install_tensorrt.sh     --TRT_VER=${TRT_VER}     --CUDA_VER=${CUDA_VER}     --CUDNN_VER=${CUDNN_VER}     --NCCL_VER=${NCCL_VER}     --CUBLAS_VER=${CUBLAS_VER} &&     rm install_ten  0.0s
 => CACHED [devel 11/16] COPY docker/common/install_polygraphy.sh install_polygraphy.sh                                                                                                                                   0.0s
 => CACHED [devel 12/16] RUN bash ./install_polygraphy.sh && rm install_polygraphy.sh                                                                                                                                     0.0s
 => CACHED [devel 13/16] COPY docker/common/install_mpi4py.sh install_mpi4py.sh                                                                                                                                           0.0s
 => CACHED [devel 14/16] RUN bash ./install_mpi4py.sh && rm install_mpi4py.sh                                                                                                                                             0.0s
 => CACHED [devel 15/16] COPY docker/common/install_pytorch.sh install_pytorch.sh                                                                                                                                         0.0s
 => CACHED [devel 16/16] RUN bash ./install_pytorch.sh skip && rm install_pytorch.sh                                                                                                                                      0.0s
 => [release  1/11] WORKDIR /app/tensorrt_llm                                                                                                                                                                             2.5s
 => [wheel 1/9] WORKDIR /src/tensorrt_llm                                                                                                                                                                                 2.5s
 => [wheel 2/9] COPY benchmarks benchmarks                                                                                                                                                                                0.0s
 => [wheel 3/9] COPY cpp cpp                                                                                                                                                                                              2.6s
 => [wheel 4/9] COPY benchmarks benchmarks                                                                                                                                                                                0.0s
 => [wheel 5/9] COPY scripts scripts                                                                                                                                                                                      0.0s
 => [wheel 6/9] COPY tensorrt_llm tensorrt_llm                                                                                                                                                                            0.0s
 => [wheel 7/9] COPY 3rdparty 3rdparty                                                                                                                                                                                    0.8s
 => [wheel 8/9] COPY setup.py requirements.txt requirements-dev.txt ./                                                                                                                                                    0.0s
 => [wheel 9/9] RUN python3 scripts/build_wheel.py --clean --trt_root /usr/local/tensorrt --python_bindings --benchmarks                                                                                               9150.1s
 => [release  2/11] COPY --from=wheel /src/tensorrt_llm/build/tensorrt_llm*.whl .                                                                                                                                         4.3s
 => [release  3/11] RUN pip install tensorrt_llm*.whl --extra-index-url https://pypi.nvidia.com &&     rm tensorrt_llm*.whl                                                                                              41.0s
 => [release  4/11] COPY README.md ./                                                                                                                                                                                     0.0s
 => [release  5/11] COPY docs docs                                                                                                                                                                                        0.0s
 => [release  6/11] COPY cpp/include include                                                                                                                                                                              0.0s
 => [release  7/11] RUN ln -sv $(python3 -c 'import site; print(f"{site.getsitepackages()[0]}/tensorrt_llm/libs")') lib &&     test -f lib/libnvinfer_plugin_tensorrt_llm.so &&     ln -sv lib/libnvinfer_plugin_tensorr  0.5s
 => [release  8/11] COPY --from=wheel /src/tensorrt_llm/benchmarks benchmarks                                                                                                                                             0.0s
 => [release  9/11] COPY --from=wheel      /src/tensorrt_llm/cpp/build/benchmarks/bertBenchmark      /src/tensorrt_llm/cpp/build/benchmarks/gptManagerBenchmark      /src/tensorrt_llm/cpp/build/benchmarks/gptSessionBe  0.0s
 => [release 10/11] COPY examples examples                                                                                                                                                                                0.2s
 => [release 11/11] RUN chmod -R a+w examples &&     rm -v       benchmarks/cpp/bertBenchmark.cpp       benchmarks/cpp/gptManagerBenchmark.cpp       benchmarks/cpp/gptSessionBenchmark.cpp       benchmarks/cpp/CMakeLi  0.5s
 => exporting to image                                                                                                                                                                                                   13.7s
 => => exporting layers                                                                                                                                                                                                  13.7s
 => => writing image sha256:442c248e3648a086c63c1372c0333290081f09f6d655563dcfdda80335e9916d                                                                                                                              0.0s
 => => naming to docker.io/tensorrt_llm/release:latest                                                                                                                                                                    0.0s
make: Leaving directory '/home/scratch.bhsueh_sw_1/workspace/TensorRT-LLM/tllm/docker'
$ docker images
REPOSITORY                                                                               TAG                                                                             IMAGE ID       CREATED         SIZE
tensorrt_llm/release                                                                     latest                                                                          442c248e3648   5 minutes ago   32GB

byshiue avatar May 31 '24 07:05 byshiue

you can build with a previous version, and pull new commit ID and rebuild, can reproduce this issue

Fred-cell avatar May 31 '24 07:05 Fred-cell

I still cannot reproduce your issue after trying again. Could you share your commit id?

byshiue avatar Jun 06 '24 03:06 byshiue

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."

github-actions[bot] avatar Jul 07 '24 01:07 github-actions[bot]

Issue has not received an update in over 14 days. Adding stale label.

github-actions[bot] avatar Dec 04 '24 17:12 github-actions[bot]

This issue was closed because it has been 14 days without activity since it has been marked as stale.

github-actions[bot] avatar Dec 18 '24 18:12 github-actions[bot]