llama-gpt icon indicating copy to clipboard operation
llama-gpt copied to clipboard

Error when running model other than 7b

Open Impre-visible opened this issue 1 year ago • 8 comments

Hi, I wanted to try the model code-7b, but I got this error :

llama-gpt-llama-gpt-ui-1   | [INFO  wait] Host [llama-gpt-api:8000] not yet available...
llama-gpt-llama-gpt-api-1  | /usr/local/lib/python3.11/site-packages/setuptools/command/develop.py:40: EasyInstallDeprecationWarning: easy_install command is deprecated.
llama-gpt-llama-gpt-api-1  | !!
llama-gpt-llama-gpt-api-1  | 
llama-gpt-llama-gpt-api-1  |         ********************************************************************************
llama-gpt-llama-gpt-api-1  |         Please avoid running ``setup.py`` and ``easy_install``.
llama-gpt-llama-gpt-api-1  |         Instead, use pypa/build, pypa/installer or other
llama-gpt-llama-gpt-api-1  |         standards-based tools.
llama-gpt-llama-gpt-api-1  | 
llama-gpt-llama-gpt-api-1  |         See https://github.com/pypa/setuptools/issues/917 for details.
llama-gpt-llama-gpt-api-1  |         ********************************************************************************
llama-gpt-llama-gpt-api-1  | 
llama-gpt-llama-gpt-api-1  | !!
llama-gpt-llama-gpt-api-1  |   easy_install.initialize_options(self)
llama-gpt-llama-gpt-api-1  | [0/1] Install the project...
llama-gpt-llama-gpt-api-1  | -- Install configuration: "Release"
llama-gpt-llama-gpt-api-1  | -- Up-to-date: /app/_skbuild/linux-x86_64-3.11/cmake-install/llama_cpp/libllama.so
llama-gpt-llama-gpt-api-1  | copying _skbuild/linux-x86_64-3.11/cmake-install/llama_cpp/libllama.so -> llama_cpp/libllama.so
llama-gpt-llama-gpt-api-1  | 
llama-gpt-llama-gpt-api-1  | running develop
llama-gpt-llama-gpt-api-1  | /usr/local/lib/python3.11/site-packages/setuptools/_distutils/cmd.py:66: SetuptoolsDeprecationWarning: setup.py install is deprecated.
llama-gpt-llama-gpt-api-1  | !!
llama-gpt-llama-gpt-api-1  | 
llama-gpt-llama-gpt-api-1  |         ********************************************************************************
llama-gpt-llama-gpt-api-1  |         Please avoid running ``setup.py`` directly.
llama-gpt-llama-gpt-api-1  |         Instead, use pypa/build, pypa/installer or other
llama-gpt-llama-gpt-api-1  |         standards-based tools.
llama-gpt-llama-gpt-api-1  | 
llama-gpt-llama-gpt-api-1  |         See https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html for details.
llama-gpt-llama-gpt-api-1  |         ********************************************************************************
llama-gpt-llama-gpt-api-1  | 
llama-gpt-llama-gpt-api-1  | !!
llama-gpt-llama-gpt-api-1  |   self.initialize_options()
llama-gpt-llama-gpt-api-1  | running egg_info
llama-gpt-llama-gpt-api-1  | writing llama_cpp_python.egg-info/PKG-INFO
llama-gpt-llama-gpt-api-1  | writing dependency_links to llama_cpp_python.egg-info/dependency_links.txt
llama-gpt-llama-gpt-api-1  | writing requirements to llama_cpp_python.egg-info/requires.txt
llama-gpt-llama-gpt-api-1  | writing top-level names to llama_cpp_python.egg-info/top_level.txt
llama-gpt-llama-gpt-api-1  | reading manifest file 'llama_cpp_python.egg-info/SOURCES.txt'
llama-gpt-llama-gpt-api-1  | adding license file 'LICENSE.md'
llama-gpt-llama-gpt-api-1  | writing manifest file 'llama_cpp_python.egg-info/SOURCES.txt'
llama-gpt-llama-gpt-api-1  | running build_ext
llama-gpt-llama-gpt-api-1  | Creating /usr/local/lib/python3.11/site-packages/llama-cpp-python.egg-link (link to .)
llama-gpt-llama-gpt-api-1  | llama-cpp-python 0.1.80 is already the active version in easy-install.pth
llama-gpt-llama-gpt-api-1  | 
llama-gpt-llama-gpt-api-1  | Installed /app
llama-gpt-llama-gpt-api-1  | Processing dependencies for llama-cpp-python==0.1.80
llama-gpt-llama-gpt-api-1  | Searching for diskcache==5.6.1
llama-gpt-llama-gpt-api-1  | Best match: diskcache 5.6.1
llama-gpt-llama-gpt-api-1  | Processing diskcache-5.6.1-py3.11.egg
llama-gpt-llama-gpt-api-1  | Adding diskcache 5.6.1 to easy-install.pth file
llama-gpt-llama-gpt-ui-1   | [INFO  wait] Host [llama-gpt-api:8000] not yet available...
llama-gpt-llama-gpt-api-1  | 
llama-gpt-llama-gpt-api-1  | Using /usr/local/lib/python3.11/site-packages/diskcache-5.6.1-py3.11.egg
llama-gpt-llama-gpt-api-1  | Searching for numpy==1.26.0b1
llama-gpt-llama-gpt-api-1  | Best match: numpy 1.26.0b1
llama-gpt-llama-gpt-api-1  | Processing numpy-1.26.0b1-py3.11-linux-x86_64.egg
llama-gpt-llama-gpt-api-1  | Adding numpy 1.26.0b1 to easy-install.pth file
llama-gpt-llama-gpt-api-1  | Installing f2py script to /usr/local/bin
llama-gpt-llama-gpt-api-1  | 
llama-gpt-llama-gpt-api-1  | Using /usr/local/lib/python3.11/site-packages/numpy-1.26.0b1-py3.11-linux-x86_64.egg
llama-gpt-llama-gpt-api-1  | Searching for typing-extensions==4.7.1
llama-gpt-llama-gpt-api-1  | Best match: typing-extensions 4.7.1
llama-gpt-llama-gpt-api-1  | Adding typing-extensions 4.7.1 to easy-install.pth file
llama-gpt-llama-gpt-api-1  | 
llama-gpt-llama-gpt-api-1  | Using /usr/local/lib/python3.11/site-packages
llama-gpt-llama-gpt-api-1  | Finished processing dependencies for llama-cpp-python==0.1.80
llama-gpt-llama-gpt-api-1  | Initializing server with:
llama-gpt-llama-gpt-api-1  | Batch size: 1024
llama-gpt-llama-gpt-api-1  | Number of CPU threads: 12
llama-gpt-llama-gpt-api-1  | Number of GPU layers: 0
llama-gpt-llama-gpt-api-1  | Context window: 4096
llama-gpt-llama-gpt-api-1  | /usr/local/lib/python3.11/site-packages/pydantic/_internal/_fields.py:127: UserWarning: Field "model_alias" has conflict with protected namespace "model_".
llama-gpt-llama-gpt-api-1  | 
llama-gpt-llama-gpt-api-1  | You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ('settings_',)`.
llama-gpt-llama-gpt-api-1  |   warnings.warn(
llama-gpt-llama-gpt-api-1 exited with code 139

It's only if I don't use the "7b" model, I tried code-7b, code-13b and 13b. And same error. How can I resolve that

Impre-visible avatar Oct 21 '23 17:10 Impre-visible

Same here.

I ran 7b first, so I was wondering if it was actually "error when running any model other than the one you ran first", so to narrow it down I deleted everything and started over with code-7b as my first try. And even being the first model, it fails the exact same way. So I think the original title is correct.

I'm running a fresh install of WSL on Win10 Pro 22H2 with Ubuntu as my chosen distro, fresh install of Docker Desktop according to their instructions. AMD 5600G with 64GB of RAM in case that matters.

myself248 avatar Oct 22 '23 15:10 myself248

This is not an error but just a warning. Just installed the llama-gpt myself, had the same feeling that something isnt right. Without GPU, the chat will be really slow, however, i found out that setting the llama-cpp-python to version 0.2.7 gave me quite a boost in performance (i am running it on the RTX 3090 with Vicuna 13B).

For comparison, my Threadripper 2970WX was crunching about 4 tokens per second for the prompt (this is the longest part of the process for some reason, response is about 30 tokens per second), setting the version of llama-cpp-python to version 0.2.7 and using GPU bumped it up to 994 tokens per second.

P.S. If you are using NVIDIA GPU with CUDA support, you might want to bump number of layers to be offloaded to GPU from 10 to 50, this will greatly increase the processing speed. Edit cuda/run.sh and change the value of n_gpu_layers

darki73 avatar Oct 27 '23 13:10 darki73

Same here. Launching it in docker with 7b and it runs good. With code-* models it is infinite loop with error above. All model files seem to be downloaded successfully in /models

proll avatar Dec 20 '23 03:12 proll

only 7b model works for me (with and without CUDA on windows 11 using WSL) - 13b goes into infinite loop trying to run

s-github-2 avatar Dec 20 '23 07:12 s-github-2

When trying with ./run.sh --model code-7b --with-cuda I am also seeing this: UserWarning: Field "model_alias" has conflict with protected namespace "model_" Although this looks like just a warning, I do see a line that says llama-gpt-llama-gpt-api-cuda-gguf-1 | make: *** No rule to make target 'build'. Stop. followed by an exited with code 139. I wonder if there is something missing in the Makefile?

The llama-gpt-llama-gpt-api-cuda-gguf-1 service is flapping, which leads me to believe there truly is something wrong here.

rvarner avatar Jan 28 '24 18:01 rvarner

Infinite loop might indicate that you dont have enough VRAM (problem is, when model has an ability to offload, for example, 43 layers, and you set the n_gpu_layers to 43, it will try to only use the GPU memory for that).

There are couple of potential reason for this issue:

  1. The context is too big - even when model requires 22GB to run, it will also take up a chunk of memory for the context, so if you go crazy, something like 8192 for the model of that size, expect the context size to be at least 4GB. So, in total 22 + 4 = 26 on a 24GB GPU and here you go, infinite loop.
  2. Simply, the model you are trying to use is too big for the amount of memory you have. This could be linked to the fact that you are offloading way too many layers, suggestion would be to start with n_gpu_layers set to 1, and increment it by 1 until it starts to crash again.
  3. Take a look at the supported models, issue might be right there. I've tried multiple variants of Vicuna, but the only one that managed to load just fine is indeed the GGUF Q4_K_M version, all others are like a gamble, they might load now, but then throw out an error (or not load at all).

The issues are not issues with the repo or the underlying libraries, it is most probably one of the 3 reasons mentioned above.

P.S. Here is where you should look for the possible issue/solution: cuda/run.sh context length cuda/run.sh gpu offloading

For starters, change context to something like 2048, see if it works, if not, then set the n_gpu_layers to 1. Dont forget to rebuild the images for the changes to take effect.

darki73 avatar Jan 28 '24 23:01 darki73

update to the latest one - 0.2.60, it can resolve this issue.

TurboLeiGlobalPay avatar Apr 09 '24 06:04 TurboLeiGlobalPay

Updating works great., thanks @TurboLeiGlobalPay .

Just update this line in your docker-compose files. image: ghcr.io/abetlen/llama-cpp-python:latest@sha256:de0fd227f348b5e43d4b5b7300f1344e712c14132914d1332182e9ecfde502b2

Replace with: image: ghcr.io/abetlen/llama-cpp-python:v0.2.63

Ualas avatar Apr 22 '24 23:04 Ualas