LocalAI
LocalAI copied to clipboard
Example Chat-UI (ChatGPT OSS Alternative) causing crash of API with preloaded model
LocalAI version: quay.io/go-skynet/local-ai:latest
Environment, CPU architecture, OS, and Version: IBM x3400 Server with
- VMware Host (x86-64 CPU Arch)
- VM Guest: Ubuntu 20.04 (x86-64 CPU Arch)
- Docker version 24.0.2, build cb74dfc
- docker-compose version 1.29.2
Describe the bug I'm new to localai and was trying to set-up the example "ChatGPT OSS Alternative" presented on localai-homepage. Link to example is: https://github.com/go-skynet/LocalAI/tree/master/examples/chatbot-ui
At first it looks like the localai-api is running fine, but sending any prompft using the chat-ui to the API causes crashing (see logs attached).
To Reproduce Try this example: https://github.com/go-skynet/LocalAI/tree/master/examples/chatbot-ui
This is my resulting docker-compose.yaml trying to adopt it:
version: '3.8'
services:
api:
# https://localai.io/basics/getting_started/index.html#run-localai-in-kubernetes
#image: quay.io/go-skynet/local-ai:v1.18.0
image: quay.io/go-skynet/local-ai:latest
build:
context: .
dockerfile: Dockerfile
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/readyz"]
interval: 1m
timeout: 20m
retries: 20
ports:
- 8080:8080
env_file:
- .env
environment:
#- DEBUG=true
- MODELS_PATH=/models
# You can preload different models here as well.
# See: https://github.com/go-skynet/model-gallery
- 'PRELOAD_MODELS=[{"url": "github:go-skynet/model-gallery/gpt4all-j.yaml", "name": "gpt-3.5-turbo"}]'
volumes:
- "./models:/models:cached"
command: ["/usr/bin/local-ai" ]
chatgpt:
depends_on:
api:
condition: service_healthy
image: ghcr.io/mckaywrigley/chatbot-ui:main
ports:
- 3000:3000
environment:
- 'OPENAI_API_KEY=sk-XXXXXXXXXXXXXXXXXXXX'
- 'OPENAI_API_HOST=http://api:8080'
Expected behavior I've expected a working example with at least any output to the chat-gpt like prompt. But there's only "internal error" response popping up.
Logs Log file from docker-container
Additional context
Possibly related to this issues as well: #195, #192
Just leaving it here in case others have similar problems ... obvisously my docker-machine had not enough RAM-Memory assigned, causing the crash when trying to load the models into RAM memory. Trying with more memory assigned to the VM and reporting here if it works then.
Tried with 16 GB RAM attached still crashes the docker-container for localai-api without useful exception pointing out what's going wrong.
I've cross checked now and deployed the same docker-compose setup on my notebook-workstation (Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz") with Ubuntu OS/Docker. There it works!
The previous deployment that caused problems was on my IBM Server which runs VMware ESXi using a Intel(R) Xeon(R) CPU E5620 @ 2.40GHz and Ubuntu OS/Docker VM.
So either local-ai stack has any kind of problems with VMware virtualisation or with Intel Xeon CPU or Xeon Model E5620?!
i have a Xeon E5649 CPU and have the same issue with the api crashing. I suspect it is an incompatible CPU.
Server specs |
---|
Dell R710 |
96 gig RAM |
2x Xeon E5649 12 core @ 2.53GHz |
28 TB storage |
ubuntu 20.04 LTS 5.4.0-86-generic kernel |
my docker compose file
version: '3.6'
services:
api:
image: quay.io/go-skynet/local-ai:latest
# As initially LocalAI will download the models defined in PRELOAD_MODELS
# you might need to tweak the healthcheck values here according to your network connection.
# Here we give a timespan of 20m to download all the required files.
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/readyz"]
interval: 1m
timeout: 20m
retries: 20
build:
context: ./
dockerfile: Dockerfile
ports:
- 8050:8080
environment:
- DEBUG=true
- REBUILD=true
- BUILD_TYPE=generic
- MODELS_PATH=/models
- THREADS=14
- CMAKE_ARGS="-DLLAMA_F16C=OFF -DLLAMA_AVX512=OFF -DLLAMA_AVX2=OFF -DLLAMA_FMA=OFF"
# You can preload different models here as well.
# See: https://github.com/go-skynet/model-gallery
- 'PRELOAD_MODELS=[{"url": "github:go-skynet/model-gallery/mpt-7b-chat.yaml", "name": "mpt-7b-chat"},{"url": "github:go-skynet/model-gallery/gpt4all-j.yaml", "name": "gpt-3.5-turbo"}, { "url": "github:go-skynet/model-gallery/bert-embeddings.yaml", "name": "text-embedding-ada-002"},{"url": "github:go-skynet/model-gallery/stablediffusion.yaml"}]'
volumes:
- ./models:/models:cached
command: ["/usr/bin/local-ai" ]
chatgpt:
depends_on:
api:
condition: service_healthy
image: ghcr.io/mckaywrigley/chatbot-ui:main
ports:
- 3500:3000
environment:
- 'OPENAI_API_KEY=sk-XXXXXXXXXXXXXXXXXXXX'
- 'OPENAI_API_HOST=http://api:8080'
volumes:
- ./models:/models:cached
failure message (there is additional output that i can provide but i will truncate it here as this seems the most relevant):
5:53PM DBG Loading model llama from WizardLM-7B-uncensored.ggmlv3.q5_1
5:53PM DBG Loading model in memory from file: /models/WizardLM-7B-uncensored.ggmlv3.q5_1
SIGILL: illegal instruction
PC=0xa1ab80 m=9 sigcode=2
signal arrived during cgo execution
instruction bytes: 0xc5 0xf9 0x6f 0x5 0x98 0xbe 0x8c 0x0 0xc7 0x47 0x10 0x0 0x0 0x0 0x0 0x48
note: i have tried multiple models. best case scenario they return no response. worst case is it crashes like this. Would love to get this working on my server just for funsies. but im pretty sure the CPU is the limiting factor here. I know for a fact it does not have AVX so... thats a bad sign from the get go
This is most likely caused by AVX support. You can compile local-ai on this machine to get a version optimized for it.
Anyone else scouring through the issues for a solution, build it locally like bnusunny mentions above like this, but it does make it incredibly slow
CMAKE_ARGS="-DLLAMA_F16C=OFF -DLLAMA_AVX512=OFF -DLLAMA_AVX2=OFF -DLLAMA_FMA=OFF" make build
https://github.com/go-skynet/LocalAI/issues/288#issuecomment-1580305158
Yeah, without AVX, ggml will be slow on CPU.
Anyone else scouring through the issues for a solution, build it locally like bnusunny mentions above like this, but it does make it incredibly slow
CMAKE_ARGS="-DLLAMA_F16C=OFF -DLLAMA_AVX512=OFF -DLLAMA_AVX2=OFF -DLLAMA_FMA=OFF" make build
Does this work in the docker container? For me I got it to run locally with just CMAKE_ARGS="-DLLAMA_AVX2=OFF"
but it was still crashing in the docker container.
:warning::warning::warning::warning::warning:
Hi! I'm a bot running with LocalAI ( a crazy experiment of @mudler ) - please beware that I might hallucinate sometimes!
but.... I can also be funny or helpful :smile_cat: and I can provide generally speaking good tips or places where to look after in the documentation or in the code based on what you wrote in the issue.
Don't engage in conversation with me, I don't support (yet) replying!
:warning::warning::warning::warning::warning:
I see that the issue is related to the LocalAI API image version. The latest version of LocalAI is 1.20.0, and the image used in the example is 1.18.0. This could be causing the issue. Please try updating the image
tag in the api
service in your docker-compose.yaml
file to use the latest version of the LocalAI API image:
image: quay.io/go-skynet/local-ai:latest
Also, make sure that you have installed all the required dependencies and packages for LocalAI on your system. You can do this by following the installation instructions provided in the LocalAI documentation.
Sources:
- https://github.com/go-skynet/LocalAI/tree/master/examples/chatbot-ui-manual/docker-compose.yaml
- https://github.com/go-skynet/LocalAI/tree/master/examples/chatbot-ui-manual/README.md
- https://github.com/go-skynet/LocalAI/tree/master/examples/chatbot-ui/README.md