text-generation-webui
text-generation-webui copied to clipboard
Dockerfile / docker-compose to help streamline build process
Wanted to run in docker, used https://github.com/RedTopper`s version in https://github.com/oobabooga/text-generation-webui/issues/174 as a base, modified slightly
added small section in readme to explain how to start up, the defaults of this config run with < 4GB of vram
Thanks! I was just about to start work on a similar PR.
I'm testing it now.
I think it would make more sense to use the source from the current directory, rather than pulling from the public git repo. This would make it easier for devs to test their patches within an isolated environment.
Unfortunately, it looks like testing failed:
#19` 58.41 [2/2] /usr/local/cuda/bin/nvcc -I/usr/local/lib/python3.10/dist-packages/torch/include -I/usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.10/dist-packages/torch/include/TH -I/usr/local/lib/python3.10/dist-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/include/python3.10 -c -c /build/quant_cuda_kernel.cu -o /build/build/temp.linux-x86_64-3.10/quant_cuda_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=quant_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_35,code=sm_35 -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 -std=c++17
#19 58.41 FAILED: /build/build/temp.linux-x86_64-3.10/quant_cuda_kernel.o
#19 58.41 /usr/local/cuda/bin/nvcc -I/usr/local/lib/python3.10/dist-packages/torch/include -I/usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.10/dist-packages/torch/include/TH -I/usr/local/lib/python3.10/dist-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/include/python3.10 -c -c /build/quant_cuda_kernel.cu -o /build/build/temp.linux-x86_64-3.10/quant_cuda_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=quant_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_35,code=sm_35 -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 -std=c++17
#19 58.41 nvcc warning : The 'compute_35', 'compute_37', 'sm_35', and 'sm_37' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
#19 58.41 /usr/local/lib/python3.10/dist-packages/torch/include/c10/util/irange.h(54): warning #186-D: pointless comparison of unsigned integer with zero
#19 58.41 detected during:
#19 58.41 instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator==(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=size_t, one_sided=false, <unnamed>=0]"
#19 58.41 (61): here
#19 58.41 instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator!=(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=size_t, one_sided=false, <unnamed>=0]"
#19 58.41 /usr/local/lib/python3.10/dist-packages/torch/include/c10/core/TensorImpl.h(77): here
#19 58.41
#19 58.41 /usr/local/lib/python3.10/dist-packages/torch/include/c10/util/irange.h(54): warning #186-D: pointless comparison of unsigned integer with zero
#19 58.41 detected during:
#19 58.41 instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator==(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=std::size_t, one_sided=true, <unnamed>=0]"
#19 58.41 (61): here
#19 58.41 instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator!=(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=std::size_t, one_sided=true, <unnamed>=0]"
#19 58.41 /usr/local/lib/python3.10/dist-packages/torch/include/ATen/core/qualified_name.h(73): here
#19 58.41
#19 58.41 /usr/local/lib/python3.10/dist-packages/torch/include/c10/util/irange.h(54): warning #186-D: pointless comparison of unsigned integer with zero
#19 58.41 detected during:
#19 58.41 instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator==(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=size_t, one_sided=false, <unnamed>=0]"
#19 58.41 (61): here
#19 58.41 instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator!=(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=size_t, one_sided=false, <unnamed>=0]"
#19 58.41 /usr/local/lib/python3.10/dist-packages/torch/include/c10/core/TensorImpl.h(77): here
#19 58.41
#19 58.41 /usr/local/lib/python3.10/dist-packages/torch/include/c10/util/irange.h(54): warning #186-D: pointless comparison of unsigned integer with zero
#19 58.41 detected during:
#19 58.41 instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator==(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=std::size_t, one_sided=true, <unnamed>=0]"
#19 58.41 (61): here
#19 58.41 instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator!=(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=std::size_t, one_sided=true, <unnamed>=0]"
#19 58.41 /usr/local/lib/python3.10/dist-packages/torch/include/ATen/core/qualified_name.h(73): here
#19 58.41
#19 58.41 /build/quant_cuda_kernel.cu(149): error: no instance of overloaded function "atomicAdd" matches the argument list
#19 58.41 argument types are: (double *, double)
#19 58.41 detected during instantiation of "void VecQuant2MatMulKernel(const scalar_t *, const int *, scalar_t *, const scalar_t *, const scalar_t *, int, int, int, int) [with scalar_t=double]"
#19 58.41 (87): here
#19 58.41
#19 58.41 /build/quant_cuda_kernel.cu(261): error: no instance of overloaded function "atomicAdd" matches the argument list
#19 58.41 argument types are: (double *, double)
#19 58.41 detected during instantiation of "void VecQuant3MatMulKernel(const scalar_t *, const int *, scalar_t *, const scalar_t *, const scalar_t *, int, int, int, int) [with scalar_t=double]"
#19 58.41 (171): here
#19 58.41
#19 58.41 /build/quant_cuda_kernel.cu(337): error: no instance of overloaded function "atomicAdd" matches the argument list
#19 58.41 argument types are: (double *, double)
#19 58.41 detected during instantiation of "void VecQuant4MatMulKernel(const scalar_t *, const int *, scalar_t *, const scalar_t *, const scalar_t *, int, int, int, int) [with scalar_t=double]"
#19 58.41 (283): here
#19 58.41
#19 58.41 /build/quant_cuda_kernel.cu(409): error: no instance of overloaded function "atomicAdd" matches the argument list
#19 58.41 argument types are: (double *, double)
#19 58.41 detected during instantiation of "void VecQuant8MatMulKernel(const scalar_t *, const int *, scalar_t *, const scalar_t *, const scalar_t *, int, int, int, int) [with scalar_t=double]"
#19 58.41 (359): here
#19 58.41
#19 58.41 4 errors detected in the compilation of "/build/quant_cuda_kernel.cu".
#19 58.42 ninja: build stopped: subcommand failed.
#19 58.42 Traceback (most recent call last):
#19 58.42 File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 1893, in _run_ninja_build
#19 58.42 subprocess.run(
#19 58.42 File "/usr/lib/python3.10/subprocess.py", line 524, in run
#19 58.42 raise CalledProcessError(retcode, process.args,
#19 58.42 subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
#19 58.42
#19 58.42 The above exception was the direct cause of the following exception:
#19 58.42
#19 58.42 Traceback (most recent call last):
#19 58.42 File "/build/setup_cuda.py", line 4, in <module>
#19 58.42 setup(
#19 58.42 File "/usr/lib/python3/dist-packages/setuptools/__init__.py", line 153, in setup
#19 58.42 return distutils.core.setup(**attrs)
#19 58.42 File "/usr/lib/python3.10/distutils/core.py", line 148, in setup
#19 58.42 dist.run_commands()
#19 58.43 File "/usr/lib/python3.10/distutils/dist.py", line 966, in run_commands
#19 58.43 self.run_command(cmd)
#19 58.43 File "/usr/lib/python3.10/distutils/dist.py", line 985, in run_command
#19 58.43 cmd_obj.run()
#19 58.43 File "/usr/lib/python3/dist-packages/wheel/bdist_wheel.py", line 299, in run
#19 58.43 self.run_command('build')
#19 58.43 File "/usr/lib/python3.10/distutils/cmd.py", line 313, in run_command
#19 58.43 self.distribution.run_command(command)
#19 58.43 File "/usr/lib/python3.10/distutils/dist.py", line 985, in run_command
#19 58.43 cmd_obj.run()
#19 58.43 File "/usr/lib/python3.10/distutils/command/build.py", line 135, in run
#19 58.43 self.run_command(cmd_name)
#19 58.43 File "/usr/lib/python3.10/distutils/cmd.py", line 313, in run_command
#19 58.43 self.distribution.run_command(command)
#19 58.43 File "/usr/lib/python3.10/distutils/dist.py", line 985, in run_command
#19 58.43 cmd_obj.run()
#19 58.43 File "/usr/lib/python3/dist-packages/setuptools/command/build_ext.py", line 79, in run
#19 58.43 _build_ext.run(self)
#19 58.43 File "/usr/lib/python3.10/distutils/command/build_ext.py", line 340, in run
#19 58.43 self.build_extensions()
#19 58.43 File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 843, in build_extensions
#19 58.43 build_ext.build_extensions(self)
#19 58.43 File "/usr/lib/python3.10/distutils/command/build_ext.py", line 449, in build_extensions
#19 58.43 self._build_extensions_serial()
#19 58.43 File "/usr/lib/python3.10/distutils/command/build_ext.py", line 474, in _build_extensions_serial
#19 58.43 self.build_extension(ext)
#19 58.43 File "/usr/lib/python3/dist-packages/setuptools/command/build_ext.py", line 202, in build_extension
#19 58.43 _build_ext.build_extension(self, ext)
#19 58.43 File "/usr/lib/python3.10/distutils/command/build_ext.py", line 529, in build_extension
#19 58.43 objects = self.compiler.compile(sources,
#19 58.43 File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 658, in unix_wrap_ninja_compile
#19 58.43 _write_ninja_file_and_compile_objects(
#19 58.43 File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 1574, in _write_ninja_file_and_compile_objects
#19 58.43 _run_ninja_build(
#19 58.43 File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 1909, in _run_ninja_build
#19 58.44 raise RuntimeError(message) from e
#19 58.44 RuntimeError: Error compiling objects for extension
------
executor failed running [/bin/sh -c python3 setup_cuda.py bdist_wheel -d .]: exit code: 1
It looks like it wants this patch: https://github.com/qwopqwop200/GPTQ-for-LLaMa/pull/58
Bumping the GPTQ SHA to 841feedde876785bc8022ca48fd9c3ff626587e2 gets past this
@deece tried setting the specific TORCH_CUDA_ARCH_LIST in the docker-compose to what your graphics card needs? the error you posted indicated that you didnt
Yup, my oldest card is an M40, which requires that patch.
On 25 March 2023 8:07:28 pm AEDT, loeken @.***> wrote:
@deece tried setting the specific TORCH_CUDA_ARCH_LIST in the docker-compose to what your graphics card needs?
-- Reply to this email directly or view it on GitHub: https://github.com/oobabooga/text-generation-webui/pull/547#issuecomment-1483771401 You are receiving this because you were mentioned.
Message ID: @.***> -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
@deece with M40 do you mean a Quadro M4000 ?
Tesla M40. I also have a Tesla K80, but it doesn't really get used.
On 25 March 2023 8:14:30 pm AEDT, loeken @.***> wrote:
@deece with M40 do you mean a Quadro M4000 ?
-- Reply to this email directly or view it on GitHub: https://github.com/oobabooga/text-generation-webui/pull/547#issuecomment-1483772763 You are receiving this because you were mentioned.
Message ID: @.***> -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
https://developer.nvidia.com/cuda-gpus <- based on the docs page your M40 expects version 5.2 try changing TORCH_CUDA_ARCH_LIST from 7.5 to 5.2 in the docker-compose.yml
I don't think that will work, as the patch mentioned above suggests that it will break for anything under 6.0.
That patch does work though, and ask that is needed to get it is to roll the pinned commit forward a bit (I tested the current HEAD and that worked).
Sent from my Android device with K-9 Mail. Please excuse my brevity.
@deece I tried your suggested sha 841feedde876785bc8022ca48fd9c3ff626587e2 and HEAD which made it fail with load_quant() missing 1 required positional argument: 'pre_layer'
I ve updated the PR and moved all configs into an .env file which might make it easier to test/compare
Thanks, I'm out all day tomorrow, but I'll have another crack on Monday
Sent from my Android device with K-9 Mail. Please excuse my brevity.
@deece it now uses HEAD, updated it to work with the new changes ( https://github.com/oobabooga/text-generation-webui/wiki/LLaMA-model#4-bit-mode )
How about also preloading extentions into the docker image?
@MarlinMr mapped the extensions folder ( and a few more others, in the docker-compose )
Yeah, it makes sense for local configuration. But I was thinking more like pulling dependencies for the current supported extensions into the docker image.
@MarlinMr running pip3 installs for the extensions too now, using the same caching as with the others, also added port 5000 for the api via docker-compose
@oobabooga mind merging this? would make it easier to hop branches and test in docker
It might be worth squashing/refactoring the commits before merging the PR. Maybe even squashing it down to a single commit?
There's a couple of missing variables from the sample env file:
WARNING: The HOST_API_PORT variable is not set. Defaulting to a blank string.
WARNING: The CONTAINER_API_PORT variable is not set. Defaulting to a blank string.
ERROR: The Compose file './docker-compose.yml' is invalid because:
services.text-generation-webui.ports contains an invalid type, it should be a number, or an object
yeah this PR has turned a bit into a mess i ll close this one and create a new clean one
https://github.com/oobabooga/text-generation-webui/pull/633