LocalAI Fresh install, getting error 500: grpc service not ready, on both exllama and AutoGPTQ

LocalAI version: Commit 2bacd0180d409b2b8f5c6f1b1ef13ccfda108c48

Environment, CPU architecture, OS, and Version:

CPU Architecture: x86_64, OS: Arch Linux, Version: 6.3.8-arch1-1

Describe the bug I followed the instructions here:

https://localai.io/model-compatibility/exllama/

And got this as the output of the curl:

{
    "error": {
        "code": 500,
        "message": "grpc service not ready",
        "type": ""
    }
}

To Reproduce

Fresh install, then following the instructions of either those:

https://localai.io/model-compatibility/exllama/

https://localai.io/model-compatibility/autogptq/

Expected behavior The LLM output.

Logs

@@@@@
If you are experiencing issues with the pre-compiled builds, try setting REBUILD=true
If you are still experiencing issues with the build, try setting CMAKE_ARGS and disable the instructions set as needed:
CMAKE_ARGS="-DLLAMA_F16C=OFF -DLLAMA_AVX512=OFF -DLLAMA_AVX2=OFF -DLLAMA_FMA=OFF"
see the documentation at: https://localai.io/basics/build/index.html
Note: See also https://github.com/go-skynet/LocalAI/issues/288
@@@@@
CPU info:
model name      : AMD Ryzen 7 2700 Eight-Core Processor
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb hw_pstate ssbd ibpb vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 clzero irperf xsaveerptr arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif overflow_recov succor smca sev sev_es
CPU:    AVX    found OK
CPU:    AVX2   found OK
CPU: no AVX512 found
@@@@@
2:11AM INF Starting LocalAI using 4 threads, with models path: /models
2:11AM INF LocalAI version: v1.24.1 (9cc8d9086580bd2a96f5c96a6b873242879c70bc)

 ┌───────────────────────────────────────────────────┐ 
 │                   Fiber v2.48.0                   │ 
 │               http://127.0.0.1:8080               │ 
 │       (bound on host 0.0.0.0 and port 8080)       │ 
 │                                                   │ 
 │ Handlers ............ 55  Processes ........... 1 │ 
 │ Prefork ....... Disabled  PID ................ 14 │ 
 └───────────────────────────────────────────────────┘ 

rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:42127: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:42127: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:42127: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:42127: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:42127: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:42127: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:42127: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:42127: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:42127: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:42127: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:37499: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:37499: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:37499: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:37499: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:37499: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:37499: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:37499: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:37499: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:37499: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:37499: connect: connection refused"

Additional context

My models dir looks like this:


[privateserver@privateserver models]$ tree
.
├── exllama.yaml
└── TheBloke_Wizard-Vicuna-30B-Uncensored-GPTQ
    ├── config.json
    ├── generation_config.json
    ├── huggingface-metadata.txt
    ├── quantize_config.json
    ├── README.md
    ├── special_tokens_map.json
    ├── tokenizer_config.json
    ├── tokenizer.json
    ├── tokenizer.model
    ├── trainer_state.json
    └── Wizard-Vicuna-30B-Uncensored-GPTQ-4bit--1g.act.order.safetensors

2 directories, 12 files

exllama.yaml looks like this:

name: exllama
parameters:
  model: TheBloke_Wizard-Vicuna-30B-Uncensored-GPTQ
backend: exllama

Aug 18 '23 02:08 Subarasheese

:warning::warning::warning::warning::warning:

Hi! I'm a bot running with LocalAI ( a crazy experiment of @mudler ) - please beware that I might hallucinate sometimes!

but.... I can also be funny or helpful :smile_cat: and I can provide generally speaking good tips or places where to look after in the documentation or in the code based on what you wrote in the issue.

Don't engage in conversation with me, I don't support (yet) replying!

:warning::warning::warning::warning::warning:

Sources:

https://github.com/go-skynet/LocalAI/tree/master/go.mod
https://github.com/go-skynet/LocalAI/tree/master/api/openai/completion.go
https://github.com/go-skynet/LocalAI/tree/master/.github/ISSUE_TEMPLATE/bug_report.md
https://localai.io/model-compatibility/exllama/index.html