text-generation-webui icon indicating copy to clipboard operation
text-generation-webui copied to clipboard

Dockerfile / docker-compose to help streamline build process

Open loeken opened this issue 1 year ago • 11 comments

Wanted to run in docker, used https://github.com/RedTopper`s version in https://github.com/oobabooga/text-generation-webui/issues/174 as a base, modified slightly

added small section in readme to explain how to start up, the defaults of this config run with < 4GB of vram

loeken avatar Mar 24 '23 23:03 loeken

Thanks! I was just about to start work on a similar PR.

I'm testing it now.

I think it would make more sense to use the source from the current directory, rather than pulling from the public git repo. This would make it easier for devs to test their patches within an isolated environment.

deece avatar Mar 25 '23 05:03 deece

Unfortunately, it looks like testing failed:

#19` 58.41 [2/2] /usr/local/cuda/bin/nvcc  -I/usr/local/lib/python3.10/dist-packages/torch/include -I/usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.10/dist-packages/torch/include/TH -I/usr/local/lib/python3.10/dist-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/include/python3.10 -c -c /build/quant_cuda_kernel.cu -o /build/build/temp.linux-x86_64-3.10/quant_cuda_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=quant_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_35,code=sm_35 -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 -std=c++17
#19 58.41 FAILED: /build/build/temp.linux-x86_64-3.10/quant_cuda_kernel.o 
#19 58.41 /usr/local/cuda/bin/nvcc  -I/usr/local/lib/python3.10/dist-packages/torch/include -I/usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.10/dist-packages/torch/include/TH -I/usr/local/lib/python3.10/dist-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/include/python3.10 -c -c /build/quant_cuda_kernel.cu -o /build/build/temp.linux-x86_64-3.10/quant_cuda_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=quant_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_35,code=sm_35 -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 -std=c++17
#19 58.41 nvcc warning : The 'compute_35', 'compute_37', 'sm_35', and 'sm_37' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
#19 58.41 /usr/local/lib/python3.10/dist-packages/torch/include/c10/util/irange.h(54): warning #186-D: pointless comparison of unsigned integer with zero
#19 58.41           detected during:
#19 58.41             instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator==(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=size_t, one_sided=false, <unnamed>=0]" 
#19 58.41 (61): here
#19 58.41             instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator!=(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=size_t, one_sided=false, <unnamed>=0]" 
#19 58.41 /usr/local/lib/python3.10/dist-packages/torch/include/c10/core/TensorImpl.h(77): here
#19 58.41 
#19 58.41 /usr/local/lib/python3.10/dist-packages/torch/include/c10/util/irange.h(54): warning #186-D: pointless comparison of unsigned integer with zero
#19 58.41           detected during:
#19 58.41             instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator==(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=std::size_t, one_sided=true, <unnamed>=0]" 
#19 58.41 (61): here
#19 58.41             instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator!=(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=std::size_t, one_sided=true, <unnamed>=0]" 
#19 58.41 /usr/local/lib/python3.10/dist-packages/torch/include/ATen/core/qualified_name.h(73): here
#19 58.41 
#19 58.41 /usr/local/lib/python3.10/dist-packages/torch/include/c10/util/irange.h(54): warning #186-D: pointless comparison of unsigned integer with zero
#19 58.41           detected during:
#19 58.41             instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator==(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=size_t, one_sided=false, <unnamed>=0]" 
#19 58.41 (61): here
#19 58.41             instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator!=(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=size_t, one_sided=false, <unnamed>=0]" 
#19 58.41 /usr/local/lib/python3.10/dist-packages/torch/include/c10/core/TensorImpl.h(77): here
#19 58.41 
#19 58.41 /usr/local/lib/python3.10/dist-packages/torch/include/c10/util/irange.h(54): warning #186-D: pointless comparison of unsigned integer with zero
#19 58.41           detected during:
#19 58.41             instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator==(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=std::size_t, one_sided=true, <unnamed>=0]" 
#19 58.41 (61): here
#19 58.41             instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator!=(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=std::size_t, one_sided=true, <unnamed>=0]" 
#19 58.41 /usr/local/lib/python3.10/dist-packages/torch/include/ATen/core/qualified_name.h(73): here
#19 58.41 
#19 58.41 /build/quant_cuda_kernel.cu(149): error: no instance of overloaded function "atomicAdd" matches the argument list
#19 58.41             argument types are: (double *, double)
#19 58.41           detected during instantiation of "void VecQuant2MatMulKernel(const scalar_t *, const int *, scalar_t *, const scalar_t *, const scalar_t *, int, int, int, int) [with scalar_t=double]" 
#19 58.41 (87): here
#19 58.41 
#19 58.41 /build/quant_cuda_kernel.cu(261): error: no instance of overloaded function "atomicAdd" matches the argument list
#19 58.41             argument types are: (double *, double)
#19 58.41           detected during instantiation of "void VecQuant3MatMulKernel(const scalar_t *, const int *, scalar_t *, const scalar_t *, const scalar_t *, int, int, int, int) [with scalar_t=double]" 
#19 58.41 (171): here
#19 58.41 
#19 58.41 /build/quant_cuda_kernel.cu(337): error: no instance of overloaded function "atomicAdd" matches the argument list
#19 58.41             argument types are: (double *, double)
#19 58.41           detected during instantiation of "void VecQuant4MatMulKernel(const scalar_t *, const int *, scalar_t *, const scalar_t *, const scalar_t *, int, int, int, int) [with scalar_t=double]" 
#19 58.41 (283): here
#19 58.41 
#19 58.41 /build/quant_cuda_kernel.cu(409): error: no instance of overloaded function "atomicAdd" matches the argument list
#19 58.41             argument types are: (double *, double)
#19 58.41           detected during instantiation of "void VecQuant8MatMulKernel(const scalar_t *, const int *, scalar_t *, const scalar_t *, const scalar_t *, int, int, int, int) [with scalar_t=double]" 
#19 58.41 (359): here
#19 58.41 
#19 58.41 4 errors detected in the compilation of "/build/quant_cuda_kernel.cu".
#19 58.42 ninja: build stopped: subcommand failed.
#19 58.42 Traceback (most recent call last):
#19 58.42   File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 1893, in _run_ninja_build
#19 58.42     subprocess.run(
#19 58.42   File "/usr/lib/python3.10/subprocess.py", line 524, in run
#19 58.42     raise CalledProcessError(retcode, process.args,
#19 58.42 subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
#19 58.42 
#19 58.42 The above exception was the direct cause of the following exception:
#19 58.42 
#19 58.42 Traceback (most recent call last):
#19 58.42   File "/build/setup_cuda.py", line 4, in <module>
#19 58.42     setup(
#19 58.42   File "/usr/lib/python3/dist-packages/setuptools/__init__.py", line 153, in setup
#19 58.42     return distutils.core.setup(**attrs)
#19 58.42   File "/usr/lib/python3.10/distutils/core.py", line 148, in setup
#19 58.42     dist.run_commands()
#19 58.43   File "/usr/lib/python3.10/distutils/dist.py", line 966, in run_commands
#19 58.43     self.run_command(cmd)
#19 58.43   File "/usr/lib/python3.10/distutils/dist.py", line 985, in run_command
#19 58.43     cmd_obj.run()
#19 58.43   File "/usr/lib/python3/dist-packages/wheel/bdist_wheel.py", line 299, in run
#19 58.43     self.run_command('build')
#19 58.43   File "/usr/lib/python3.10/distutils/cmd.py", line 313, in run_command
#19 58.43     self.distribution.run_command(command)
#19 58.43   File "/usr/lib/python3.10/distutils/dist.py", line 985, in run_command
#19 58.43     cmd_obj.run()
#19 58.43   File "/usr/lib/python3.10/distutils/command/build.py", line 135, in run
#19 58.43     self.run_command(cmd_name)
#19 58.43   File "/usr/lib/python3.10/distutils/cmd.py", line 313, in run_command
#19 58.43     self.distribution.run_command(command)
#19 58.43   File "/usr/lib/python3.10/distutils/dist.py", line 985, in run_command
#19 58.43     cmd_obj.run()
#19 58.43   File "/usr/lib/python3/dist-packages/setuptools/command/build_ext.py", line 79, in run
#19 58.43     _build_ext.run(self)
#19 58.43   File "/usr/lib/python3.10/distutils/command/build_ext.py", line 340, in run
#19 58.43     self.build_extensions()
#19 58.43   File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 843, in build_extensions
#19 58.43     build_ext.build_extensions(self)
#19 58.43   File "/usr/lib/python3.10/distutils/command/build_ext.py", line 449, in build_extensions
#19 58.43     self._build_extensions_serial()
#19 58.43   File "/usr/lib/python3.10/distutils/command/build_ext.py", line 474, in _build_extensions_serial
#19 58.43     self.build_extension(ext)
#19 58.43   File "/usr/lib/python3/dist-packages/setuptools/command/build_ext.py", line 202, in build_extension
#19 58.43     _build_ext.build_extension(self, ext)
#19 58.43   File "/usr/lib/python3.10/distutils/command/build_ext.py", line 529, in build_extension
#19 58.43     objects = self.compiler.compile(sources,
#19 58.43   File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 658, in unix_wrap_ninja_compile
#19 58.43     _write_ninja_file_and_compile_objects(
#19 58.43   File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 1574, in _write_ninja_file_and_compile_objects
#19 58.43     _run_ninja_build(
#19 58.43   File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 1909, in _run_ninja_build
#19 58.44     raise RuntimeError(message) from e
#19 58.44 RuntimeError: Error compiling objects for extension
------
executor failed running [/bin/sh -c python3 setup_cuda.py bdist_wheel -d .]: exit code: 1

deece avatar Mar 25 '23 05:03 deece

It looks like it wants this patch: https://github.com/qwopqwop200/GPTQ-for-LLaMa/pull/58

Bumping the GPTQ SHA to 841feedde876785bc8022ca48fd9c3ff626587e2 gets past this

deece avatar Mar 25 '23 05:03 deece

@deece tried setting the specific TORCH_CUDA_ARCH_LIST in the docker-compose to what your graphics card needs? the error you posted indicated that you didnt

loeken avatar Mar 25 '23 09:03 loeken

Yup, my oldest card is an M40, which requires that patch.

On 25 March 2023 8:07:28 pm AEDT, loeken @.***> wrote:

@deece tried setting the specific TORCH_CUDA_ARCH_LIST in the docker-compose to what your graphics card needs?

-- Reply to this email directly or view it on GitHub: https://github.com/oobabooga/text-generation-webui/pull/547#issuecomment-1483771401 You are receiving this because you were mentioned.

Message ID: @.***> -- Sent from my Android device with K-9 Mail. Please excuse my brevity.

deece avatar Mar 25 '23 09:03 deece

@deece with M40 do you mean a Quadro M4000 ?

loeken avatar Mar 25 '23 09:03 loeken

Tesla M40. I also have a Tesla K80, but it doesn't really get used.

On 25 March 2023 8:14:30 pm AEDT, loeken @.***> wrote:

@deece with M40 do you mean a Quadro M4000 ?

-- Reply to this email directly or view it on GitHub: https://github.com/oobabooga/text-generation-webui/pull/547#issuecomment-1483772763 You are receiving this because you were mentioned.

Message ID: @.***> -- Sent from my Android device with K-9 Mail. Please excuse my brevity.

deece avatar Mar 25 '23 09:03 deece

https://developer.nvidia.com/cuda-gpus <- based on the docs page your M40 expects version 5.2 try changing TORCH_CUDA_ARCH_LIST from 7.5 to 5.2 in the docker-compose.yml

loeken avatar Mar 25 '23 09:03 loeken

I don't think that will work, as the patch mentioned above suggests that it will break for anything under 6.0.

That patch does work though, and ask that is needed to get it is to roll the pinned commit forward a bit (I tested the current HEAD and that worked).

Sent from my Android device with K-9 Mail. Please excuse my brevity.

deece avatar Mar 25 '23 10:03 deece

@deece I tried your suggested sha 841feedde876785bc8022ca48fd9c3ff626587e2 and HEAD which made it fail with load_quant() missing 1 required positional argument: 'pre_layer'

I ve updated the PR and moved all configs into an .env file which might make it easier to test/compare

loeken avatar Mar 25 '23 11:03 loeken

Thanks, I'm out all day tomorrow, but I'll have another crack on Monday

Sent from my Android device with K-9 Mail. Please excuse my brevity.

deece avatar Mar 25 '23 11:03 deece

@deece it now uses HEAD, updated it to work with the new changes ( https://github.com/oobabooga/text-generation-webui/wiki/LLaMA-model#4-bit-mode )

loeken avatar Mar 27 '23 10:03 loeken

How about also preloading extentions into the docker image?

MarlinMr avatar Mar 27 '23 19:03 MarlinMr

@MarlinMr mapped the extensions folder ( and a few more others, in the docker-compose )

loeken avatar Mar 27 '23 20:03 loeken

Yeah, it makes sense for local configuration. But I was thinking more like pulling dependencies for the current supported extensions into the docker image.

MarlinMr avatar Mar 27 '23 20:03 MarlinMr

@MarlinMr running pip3 installs for the extensions too now, using the same caching as with the others, also added port 5000 for the api via docker-compose

loeken avatar Mar 27 '23 21:03 loeken

@oobabooga mind merging this? would make it easier to hop branches and test in docker

loeken avatar Mar 29 '23 08:03 loeken

It might be worth squashing/refactoring the commits before merging the PR. Maybe even squashing it down to a single commit?

deece avatar Mar 29 '23 09:03 deece

There's a couple of missing variables from the sample env file:

WARNING: The HOST_API_PORT variable is not set. Defaulting to a blank string.
WARNING: The CONTAINER_API_PORT variable is not set. Defaulting to a blank string.
ERROR: The Compose file './docker-compose.yml' is invalid because:
services.text-generation-webui.ports contains an invalid type, it should be a number, or an object

deece avatar Mar 29 '23 10:03 deece

yeah this PR has turned a bit into a mess i ll close this one and create a new clean one

loeken avatar Mar 29 '23 10:03 loeken

https://github.com/oobabooga/text-generation-webui/pull/633

loeken avatar Mar 29 '23 11:03 loeken