llama-gpt
llama-gpt copied to clipboard
Error when running model other than 7b
Hi, I wanted to try the model code-7b, but I got this error :
llama-gpt-llama-gpt-ui-1 | [INFO wait] Host [llama-gpt-api:8000] not yet available...
llama-gpt-llama-gpt-api-1 | /usr/local/lib/python3.11/site-packages/setuptools/command/develop.py:40: EasyInstallDeprecationWarning: easy_install command is deprecated.
llama-gpt-llama-gpt-api-1 | !!
llama-gpt-llama-gpt-api-1 |
llama-gpt-llama-gpt-api-1 | ********************************************************************************
llama-gpt-llama-gpt-api-1 | Please avoid running ``setup.py`` and ``easy_install``.
llama-gpt-llama-gpt-api-1 | Instead, use pypa/build, pypa/installer or other
llama-gpt-llama-gpt-api-1 | standards-based tools.
llama-gpt-llama-gpt-api-1 |
llama-gpt-llama-gpt-api-1 | See https://github.com/pypa/setuptools/issues/917 for details.
llama-gpt-llama-gpt-api-1 | ********************************************************************************
llama-gpt-llama-gpt-api-1 |
llama-gpt-llama-gpt-api-1 | !!
llama-gpt-llama-gpt-api-1 | easy_install.initialize_options(self)
llama-gpt-llama-gpt-api-1 | [0/1] Install the project...
llama-gpt-llama-gpt-api-1 | -- Install configuration: "Release"
llama-gpt-llama-gpt-api-1 | -- Up-to-date: /app/_skbuild/linux-x86_64-3.11/cmake-install/llama_cpp/libllama.so
llama-gpt-llama-gpt-api-1 | copying _skbuild/linux-x86_64-3.11/cmake-install/llama_cpp/libllama.so -> llama_cpp/libllama.so
llama-gpt-llama-gpt-api-1 |
llama-gpt-llama-gpt-api-1 | running develop
llama-gpt-llama-gpt-api-1 | /usr/local/lib/python3.11/site-packages/setuptools/_distutils/cmd.py:66: SetuptoolsDeprecationWarning: setup.py install is deprecated.
llama-gpt-llama-gpt-api-1 | !!
llama-gpt-llama-gpt-api-1 |
llama-gpt-llama-gpt-api-1 | ********************************************************************************
llama-gpt-llama-gpt-api-1 | Please avoid running ``setup.py`` directly.
llama-gpt-llama-gpt-api-1 | Instead, use pypa/build, pypa/installer or other
llama-gpt-llama-gpt-api-1 | standards-based tools.
llama-gpt-llama-gpt-api-1 |
llama-gpt-llama-gpt-api-1 | See https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html for details.
llama-gpt-llama-gpt-api-1 | ********************************************************************************
llama-gpt-llama-gpt-api-1 |
llama-gpt-llama-gpt-api-1 | !!
llama-gpt-llama-gpt-api-1 | self.initialize_options()
llama-gpt-llama-gpt-api-1 | running egg_info
llama-gpt-llama-gpt-api-1 | writing llama_cpp_python.egg-info/PKG-INFO
llama-gpt-llama-gpt-api-1 | writing dependency_links to llama_cpp_python.egg-info/dependency_links.txt
llama-gpt-llama-gpt-api-1 | writing requirements to llama_cpp_python.egg-info/requires.txt
llama-gpt-llama-gpt-api-1 | writing top-level names to llama_cpp_python.egg-info/top_level.txt
llama-gpt-llama-gpt-api-1 | reading manifest file 'llama_cpp_python.egg-info/SOURCES.txt'
llama-gpt-llama-gpt-api-1 | adding license file 'LICENSE.md'
llama-gpt-llama-gpt-api-1 | writing manifest file 'llama_cpp_python.egg-info/SOURCES.txt'
llama-gpt-llama-gpt-api-1 | running build_ext
llama-gpt-llama-gpt-api-1 | Creating /usr/local/lib/python3.11/site-packages/llama-cpp-python.egg-link (link to .)
llama-gpt-llama-gpt-api-1 | llama-cpp-python 0.1.80 is already the active version in easy-install.pth
llama-gpt-llama-gpt-api-1 |
llama-gpt-llama-gpt-api-1 | Installed /app
llama-gpt-llama-gpt-api-1 | Processing dependencies for llama-cpp-python==0.1.80
llama-gpt-llama-gpt-api-1 | Searching for diskcache==5.6.1
llama-gpt-llama-gpt-api-1 | Best match: diskcache 5.6.1
llama-gpt-llama-gpt-api-1 | Processing diskcache-5.6.1-py3.11.egg
llama-gpt-llama-gpt-api-1 | Adding diskcache 5.6.1 to easy-install.pth file
llama-gpt-llama-gpt-ui-1 | [INFO wait] Host [llama-gpt-api:8000] not yet available...
llama-gpt-llama-gpt-api-1 |
llama-gpt-llama-gpt-api-1 | Using /usr/local/lib/python3.11/site-packages/diskcache-5.6.1-py3.11.egg
llama-gpt-llama-gpt-api-1 | Searching for numpy==1.26.0b1
llama-gpt-llama-gpt-api-1 | Best match: numpy 1.26.0b1
llama-gpt-llama-gpt-api-1 | Processing numpy-1.26.0b1-py3.11-linux-x86_64.egg
llama-gpt-llama-gpt-api-1 | Adding numpy 1.26.0b1 to easy-install.pth file
llama-gpt-llama-gpt-api-1 | Installing f2py script to /usr/local/bin
llama-gpt-llama-gpt-api-1 |
llama-gpt-llama-gpt-api-1 | Using /usr/local/lib/python3.11/site-packages/numpy-1.26.0b1-py3.11-linux-x86_64.egg
llama-gpt-llama-gpt-api-1 | Searching for typing-extensions==4.7.1
llama-gpt-llama-gpt-api-1 | Best match: typing-extensions 4.7.1
llama-gpt-llama-gpt-api-1 | Adding typing-extensions 4.7.1 to easy-install.pth file
llama-gpt-llama-gpt-api-1 |
llama-gpt-llama-gpt-api-1 | Using /usr/local/lib/python3.11/site-packages
llama-gpt-llama-gpt-api-1 | Finished processing dependencies for llama-cpp-python==0.1.80
llama-gpt-llama-gpt-api-1 | Initializing server with:
llama-gpt-llama-gpt-api-1 | Batch size: 1024
llama-gpt-llama-gpt-api-1 | Number of CPU threads: 12
llama-gpt-llama-gpt-api-1 | Number of GPU layers: 0
llama-gpt-llama-gpt-api-1 | Context window: 4096
llama-gpt-llama-gpt-api-1 | /usr/local/lib/python3.11/site-packages/pydantic/_internal/_fields.py:127: UserWarning: Field "model_alias" has conflict with protected namespace "model_".
llama-gpt-llama-gpt-api-1 |
llama-gpt-llama-gpt-api-1 | You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ('settings_',)`.
llama-gpt-llama-gpt-api-1 | warnings.warn(
llama-gpt-llama-gpt-api-1 exited with code 139
It's only if I don't use the "7b" model, I tried code-7b, code-13b and 13b. And same error. How can I resolve that
Same here.
I ran 7b first, so I was wondering if it was actually "error when running any model other than the one you ran first", so to narrow it down I deleted everything and started over with code-7b as my first try. And even being the first model, it fails the exact same way. So I think the original title is correct.
I'm running a fresh install of WSL on Win10 Pro 22H2 with Ubuntu as my chosen distro, fresh install of Docker Desktop according to their instructions. AMD 5600G with 64GB of RAM in case that matters.
This is not an error but just a warning.
Just installed the llama-gpt myself, had the same feeling that something isnt right.
Without GPU, the chat will be really slow, however, i found out that setting the llama-cpp-python
to version 0.2.7
gave me quite a boost in performance (i am running it on the RTX 3090 with Vicuna 13B).
For comparison, my Threadripper 2970WX was crunching about 4 tokens per second for the prompt (this is the longest part of the process for some reason, response is about 30 tokens per second), setting the version of llama-cpp-python
to version 0.2.7
and using GPU bumped it up to 994 tokens per second.
P.S. If you are using NVIDIA GPU with CUDA support, you might want to bump number of layers to be offloaded to GPU from 10 to 50, this will greatly increase the processing speed.
Edit cuda/run.sh
and change the value of n_gpu_layers
Same here. Launching it in docker with 7b
and it runs good.
With code-*
models it is infinite loop with error above. All model files seem to be downloaded successfully in /models
only 7b model works for me (with and without CUDA on windows 11 using WSL) - 13b goes into infinite loop trying to run
When trying with ./run.sh --model code-7b --with-cuda
I am also seeing this: UserWarning: Field "model_alias" has conflict with protected namespace "model_"
Although this looks like just a warning, I do see a line that says llama-gpt-llama-gpt-api-cuda-gguf-1 | make: *** No rule to make target 'build'. Stop.
followed by an exited with code 139
. I wonder if there is something missing in the Makefile?
The llama-gpt-llama-gpt-api-cuda-gguf-1
service is flapping, which leads me to believe there truly is something wrong here.
Infinite loop might indicate that you dont have enough VRAM (problem is, when model has an ability to offload, for example, 43 layers, and you set the n_gpu_layers
to 43, it will try to only use the GPU memory for that).
There are couple of potential reason for this issue:
- The context is too big - even when model requires 22GB to run, it will also take up a chunk of memory for the context, so if you go crazy, something like 8192 for the model of that size, expect the context size to be at least 4GB. So, in total 22 + 4 = 26 on a 24GB GPU and here you go, infinite loop.
- Simply, the model you are trying to use is too big for the amount of memory you have. This could be linked to the fact that you are offloading way too many layers, suggestion would be to start with
n_gpu_layers
set to 1, and increment it by 1 until it starts to crash again. - Take a look at the supported models, issue might be right there. I've tried multiple variants of Vicuna, but the only one that managed to load just fine is indeed the
GGUF Q4_K_M
version, all others are like a gamble, they might load now, but then throw out an error (or not load at all).
The issues are not issues with the repo or the underlying libraries, it is most probably one of the 3 reasons mentioned above.
P.S. Here is where you should look for the possible issue/solution: cuda/run.sh context length cuda/run.sh gpu offloading
For starters, change context to something like 2048, see if it works, if not, then set the n_gpu_layers
to 1.
Dont forget to rebuild the images for the changes to take effect.
update to the latest one - 0.2.60, it can resolve this issue.
Updating works great., thanks @TurboLeiGlobalPay .
Just update this line in your docker-compose files.
image: ghcr.io/abetlen/llama-cpp-python:latest@sha256:de0fd227f348b5e43d4b5b7300f1344e712c14132914d1332182e9ecfde502b2
Replace with:
image: ghcr.io/abetlen/llama-cpp-python:v0.2.63