When trying to use GPU acceleration locally, I encountered an error. Here are the steps I took. Could you please help identify where I might have gone wrong in my operations or configurations? The manual execution guide in the documentation doesn't seem detailed enough, especially regarding how to enable GPU-related configurations!

1. started `./local-ai-cuda12` with the command below

./local-ai-cuda12
8:01PM INF Setting logging to info
8:01PM INF Starting LocalAI using 4 threads, with models path: /root/autodl-tmp/local-ai/models
8:01PM INF LocalAI version: v2.14.0 (b58274b8a26a3d22605e3c484cf39c5dd9a5cf8e)
8:01PM INF Preloading models from /root/autodl-tmp/local-ai/models

  Model name: hermes-2-pro-llama-3-8b:Q8_0                                    

  Model name: llama3-8b-instruct                                              

8:01PM ERR error establishing configuration directory watcher error="unable to establish watch on the LocalAI Configuration Directory: no such file or directory"
8:01PM INF core/startup process completed!
8:01PM INF LocalAI API is listening! Please connect to the endpoint for API documentation. endpoint=http://0.0.0.0:8080
8:05PM ERR Server error error="json: unsupported type: map[interface {}]interface {}" ip=127.0.0.1 latency=1.134510552s method=GET status=500 url=/models/available

8:06PM INF Success ip=127.0.0.1 latency="281.911µs" method=GET status=200 url=/v1/models
8:07PM INF Trying to load the model 'Hermes-2-Pro-Llama-3-8B-Q8_0.gguf' with all the available backends: llama-cpp, llama-ggml, gpt4all, bert-embeddings, rwkv, whisper, stablediffusion, tinydream, piper
8:07PM INF [llama-cpp] Attempting to load
8:07PM INF Loading model 'Hermes-2-Pro-Llama-3-8B-Q8_0.gguf' with backend llama-cpp
8:07PM INF [llama-cpp] Fails: could not load model: rpc error: code = Unavailable desc = error reading from server: EOF
8:07PM INF [llama-ggml] Attempting to load
8:07PM INF Loading model 'Hermes-2-Pro-Llama-3-8B-Q8_0.gguf' with backend llama-ggml
8:07PM INF [llama-ggml] Fails: could not load model: rpc error: code = Unknown desc = failed loading model
8:07PM INF [gpt4all] Attempting to load
8:07PM INF Loading model 'Hermes-2-Pro-Llama-3-8B-Q8_0.gguf' with backend gpt4all
8:07PM INF [gpt4all] Fails: could not load model: rpc error: code = Unknown desc = failed loading model
8:07PM INF [bert-embeddings] Attempting to load
8:07PM INF Loading model 'Hermes-2-Pro-Llama-3-8B-Q8_0.gguf' with backend bert-embeddings
8:07PM INF [bert-embeddings] Fails: could not load model: rpc error: code = Unknown desc = failed loading model
8:07PM INF [rwkv] Attempting to load
8:07PM INF Loading model 'Hermes-2-Pro-Llama-3-8B-Q8_0.gguf' with backend rwkv
8:07PM INF [rwkv] Fails: could not load model: rpc error: code = Unavailable desc = error reading from server: EOF
8:07PM INF [whisper] Attempting to load
8:07PM INF Loading model 'Hermes-2-Pro-Llama-3-8B-Q8_0.gguf' with backend whisper
8:07PM INF [whisper] Fails: could not load model: rpc error: code = Unknown desc = unable to load model
8:07PM INF [stablediffusion] Attempting to load
8:07PM INF Loading model 'Hermes-2-Pro-Llama-3-8B-Q8_0.gguf' with backend stablediffusion
8:07PM INF [stablediffusion] Fails: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/stablediffusion. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS
8:07PM INF [tinydream] Attempting to load
8:07PM INF Loading model 'Hermes-2-Pro-Llama-3-8B-Q8_0.gguf' with backend tinydream
8:07PM INF [tinydream] Fails: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/tinydream. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS
8:07PM INF [piper] Attempting to load
8:07PM INF Loading model 'Hermes-2-Pro-Llama-3-8B-Q8_0.gguf' with backend piper
8:07PM INF [piper] Fails: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/piper. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS
8:07PM ERR Server error error="could not load model - all backends returned error: 9 errors occurred:\n\t* could not load model: rpc error: code = Unavailable desc = error reading from server: EOF\n\t* could not load model: rpc error: code = Unknown desc = failed loading model\n\t* could not load model: rpc error: code = Unknown desc = failed loading model\n\t* could not load model: rpc error: code = Unknown desc = failed loading model\n\t* could not load model: rpc error: code = Unavailable desc = error reading from server: EOF\n\t* could not load model: rpc error: code = Unknown desc = unable to load model\n\t* grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/stablediffusion. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS\n\t* grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/tinydream. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS\n\t* grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/piper. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS\n\n" ip=127.0.0.1 latency=15.442258698s method=POST status=500 url=/v1/chat/completions

2.1 Executing an API Call

curl http://localhost:8080/v1/models

{"object":"list","data":[{"id":"hermes-2-pro-llama-3-8b:Q8_0","object":"model"},{"id":"llama3-8b-instruct","object":"model"},{"id":"Meta-Llama-3-8B-Instruct.Q4_0.gguf","object":"model"}]}

2.2 Next

curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
>      "model": "hermes-2-pro-llama-3-8b:Q8_0",
>      "messages": [{"role": "user", "content": "Who are you?"}]
>    }'

Then got the Error?

{"error":{"code":500,"message":"could not load model - all backends returned error: 9 errors occurred:\n\t* could not load model: rpc error: code = Unavailable desc = error reading from server: EOF\n\t* could not load model: rpc error: code = Unknown desc = failed loading model\n\t* could not load model: rpc error: code = Unknown desc = failed loading model\n\t* could not load model: rpc error: code = Unknown desc = failed loading model\n\t* could not load model: rpc error: code = Unavailable desc = error reading from server: EOF\n\t* could not load model: rpc error: code = Unknown desc = unable to load model\n\t* grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/stablediffusion. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS\n\t* grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/tinydream. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS\n\t* grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/piper. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS\n\n","type":""}}

nvidia-smi 
Thu May  9 20:14:26 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.146.02             Driver Version: 535.146.02   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 4090        On  | 00000000:16:00.0 Off |                  Off |
| 30%   29C    P8              16W / 450W |    402MiB / 24564MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
+---------------------------------------------------------------------------------------+

# nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Mon_Apr__3_17:16:06_PDT_2023
Cuda compilation tools, release 12.1, V12.1.105
Build cuda_12.1.r12.1/compiler.32688072_0

May 09 '24 12:05 WuQic

Hello WuQic, I am an experiment of @mudler and I am here to help you with your issue. I apologize for any inconvenience caused.

Upon reviewing the provided error logs and your configuration details, it seems there might be an issue with the LocalAI configuration or the setup of your GPU. Here are a few points to consider:

GPU Setup: It appears that your GPU, NVIDIA GeForce RTX 4090, is recognized by the system. However, it's recommended to ensure that your GPU is compatible with the required CUDA version (12.2) and the LocalAI dependencies. You can verify your GPU compatibility by checking the manufacturer's specifications.
Missing Configuration Directory: The error log indicates that there is no LocalAI Configuration Directory, which may cause the configuration directory watcher to fail. Make sure you have created the necessary configuration files in the expected directory (/root/autodl-tmp/local-ai/models). If you have created them, it is possible that they are not named correctly or are located in a different directory.
Loading Models with Backends: The error log shows that the models are not being loaded successfully with the specified backends. To troubleshoot this issue, you can try the following:
- Check the model compatibility with the selected backends. Ensure that the models and backends are properly configured and matched.
- Verify that the required dependencies for the backends (e.g., stablediffusion, tinydream, piper) are correctly installed and configured.
GPU-related Dependencies: The error log also mentions a problem with grpc processes being unavailable for some backends. This could be due to missing dependencies or misconfigured setups. Ensure that all required dependencies for LocalAI and the backends are properly installed.

To resolve the issue, you can try the following steps:

Verify that your GPU is compatible with the required CUDA version, and all necessary GPU-related dependencies are installed.
Ensure that the LocalAI Configuration Directory exists and contains the required configuration files.
Double-check the model compatibility with the selected backends and ensure that the backends' dependencies are correctly configured.
If the issue persists, please share more details about your system (e.g., operating system, LocalAI version, etc.) for further assistance.

I hope this information is helpful. Let me know if you have any further questions or concerns.

May 09 '24 12:05 localai-bot

Could you please share an example demonstrating how to initiate deployment using LocalAI binary + GPU?

May 10 '24 02:05 WuQic

Can you activate debug log level? Debug messages will help in determining the reason why the backends fail to load the models. Can you paste a debug log? If you do that, you can see is the gpu is used, and how many layers the model and MB of ram is trying to allocate in the GPU... Lunch the binary with --debug

May 13 '24 14:05 joseluisll

I solved it with the latest version:

# create dir
mkdir /opt/apps/local-ai /opt/apps/local-ai/models /opt/apps/local-ai/configuration
cd /opt/apps/local-ai
# Download CUDA 12
wget https://github.com/mudler/LocalAI/releases/download/v2.15.0/local-ai-cuda12-Linux-x86_64
# chmod
chmod +x local-ai-cuda12-Linux-x86_64

# run
LOCALAI_LOG_LEVEL=debug
ADDRESS=":6006"
./local-ai-cuda12-Linux-x86_64

May 14 '24 06:05 WuQic

LocalAI
LocalAI copied to clipboard

Manual Model Execution Error: Troubleshooting GPU Acceleration

1. started `./local-ai-cuda12` with the command below

2.1 Executing an API Call

2.2 Next

LocalAI LocalAI copied to clipboard

Manual Model Execution Error: Troubleshooting GPU Acceleration

1. started ./local-ai-cuda12 with the command below

2.1 Executing an API Call

2.2 Next

LocalAI
LocalAI copied to clipboard

1. started `./local-ai-cuda12` with the command below