LocalAI
LocalAI copied to clipboard
localai:v2.10.1-cublas-cuda12-core ”Fails: grpc process not found”
LocalAI version:
localai:v2.10.1-cublas-cuda12-core Environment, CPU architecture, OS, and Version: windows 11,RTX 4090,cuda 12.3
2024-03-21 19:23:56 11:23AM INF [/build/backend/python/sentencetransformers/run.sh] Fails: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/sentencetransformers/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS 2024-03-21 19:23:56 11:23AM INF [/build/backend/python/coqui/run.sh] Fails: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/coqui/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS Describe the bug
i just run the command "docker run -ti -p 8080:8080 --gpus all localai/localai:v2.10.1-cublas-cuda12-core mixtral-instruct",and send a test prompt to the 8080 port.then i return the error to me. To Reproduce
the command i send is "curl http://localhost:8080/v1/completions -H "Content-Type: application/json" -d "{ "model": "mixtral-instruct", "prompt": "How are you doing?" }"" Expected behavior
Logs
Additional context
Hi @testKKP can you post your model definition?
The core
images does not include python so SentenceTransformers and Coqui are not available.
For your reference backends that requires python are:
- autogptq
- bark
- coqui
- diffusers
- exllama
- exllama2
- mamba
- petals
- sentencetransformers
- transformers
- transformers-musicgen
- vall-e-x
- vllm
Same issue here, using the new all in one image
docker run -p 8080:8080 --gpus all --name local-ai -ti quay.io/go-skynet/local-ai:latest-aio-gpu-nvidia-cuda-12
Request:
{ "model": "gpt-4", "messages": [{"role": "user",
"content": "How are you doing?", "temperature": 0.1}]}
Response:
{
"error": {
"code": 500,
"message": "rpc error: code = Unknown desc = unimplemented",
"type": ""
}
}
Running Fedora Linux + Nvidia RTX3060
Hi @ThaDrone this is a different issue since gpt-4 from AIO images uses llama.cpp backend while it seems that @testKKP is using a python (transformer?) backend.
Can you open a new issue with the output from docker logs local-ai
?
Hi @ThaDrone this is a different issue since gpt-4 from AIO images uses llama.cpp backend while it seems that @testKKP is using a python (transformer?) backend. Can you open a new issue with the output from
docker logs local-ai
?
thansk for you reply.in fact ,i dont know my model definition. i just run the command provided in the official website like this “docker run -ti -p 8080:8080 --gpus all localai/localai:v2.10.1-cublas-cuda12-core mixtral-instruct”. the others were all done by the docker itself.so i dont know how to solve it.
There is mixtral configuration in the example directory: https://github.com/mudler/LocalAI/tree/master/examples/configurations/mixtral.
Download the files in your models directory with the GGUF file.
In the example is used mixtral-8x7b-instruct-v0.1.Q2_K.gguf
: you can choose whatever you prefer just remember to update the config.
As a bot (experiment of @mudler), I would like to provide some guidelines to resolve the issue:
- Make sure Docker is properly installed and running on your system.
- Download the
mixtral-instruct
configuration files from the relevant GGUF link. You can choose any available GGUF file for mixtral-instruct. - Save the downloaded files in the
models
directory of your LocalAI installation. - Stop the currently running Docker container if it's still running. You can use the command:
docker stop local-ai
- Run the following command, replacing
[gguf_file]
with the name of the file you downloaded in step 2:
docker run -p 8080:8080 --gpus all localai/localai:v2.10.1-cublas-cuda12-core mixtral-instruct-[gguf_file]
Please replace [gguf_file]
in the command with the actual filename you downloaded in step 3. If you followed the instructions in the example, the command should look like this:
docker run -p 8080:8080 --gpus all localai/localai:v2.10.1-cublas-cuda12-core mixtral-8x7b-instruct-v0.1.Q2_K.gguf
If you need further assistance, please provide the output of the command and any additional logs while running LocalAI in debug mode (--debug
or DEBUG=true
).
Same issue here, using the new all in one image
docker run -p 8080:8080 --gpus all --name local-ai -ti quay.io/go-skynet/local-ai:latest-aio-gpu-nvidia-cuda-12
Request:
{ "model": "gpt-4", "messages": [{"role": "user", "content": "How are you doing?", "temperature": 0.1}]}
Response:
{ "error": { "code": 500, "message": "rpc error: code = Unknown desc = unimplemented", "type": "" } }
Running Fedora Linux + Nvidia RTX3060
Hello, I also had a similar problem, did you solve it?
I am trying to debug the original error described in the OP's comment, since I am facing the same error running localai with docker locally. Unlike the OP, I am running the aio image localai/localai:latest-aio-gpu-nvidia-cuda-12
.
Here is what I found:
- Attempting to hit the
text-embedding-ada-002
embeddings works fine, and loads the MiniLM model. - Attempting to ask for sentence-transformer model, e.g.
sentence-transformers/distiluse-base-multilingual-cased-v1
results in an error (full logs below)
References
Running top
inside the container
62 root 20 0 15.4g 924956 344376 S 0.3 0.9 0:20.87 python /build/backend/python/sentencetransformers/backend.py --addr 127.0.0.1:36433
408 root 20 0 7584852 246416 157428 S 0.3 0.3 0:00.74 /tmp/localai/backend_data/backend-assets/grpc/llama-ggml --addr 127.0.0.1:46483
424 root 20 0 2600784 128992 61288 S 0.3 0.1 0:02.40 /tmp/localai/backend_data/backend-assets/grpc/llama-cpp-fallback --addr 127.0.0.1:37325
1 root 20 0 6444992 1.8g 1.8g S 0.0 1.9 0:14.64 ./local-ai
257 root 20 0 4628 3856 3252 S 0.0 0.0 0:00.04 bash
360 root 20 0 2600784 128684 60972 S 0.0 0.1 0:02.43 /tmp/localai/backend_data/backend-assets/grpc/llama-cpp-avx2 --addr 127.0.0.1:42339
384 root 20 0 2600784 128728 61012 S 0.0 0.1 0:02.40 /tmp/localai/backend_data/backend-assets/grpc/llama-cpp-avx2 --addr 127.0.0.1:45619
417 root 20 0 1707144 31364 20684 S 0.0 0.0 0:00.22 /tmp/localai/backend_data/backend-assets/grpc/gpt4all --addr 127.0.0.1:46475
458 root 20 0 1725624 34776 23276 S 0.0 0.0 0:00.23 /tmp/localai/backend_data/backend-assets/grpc/piper --addr 127.0.0.1:45617
465 root 20 0 2378980 137008 62124 S 0.0 0.1 0:00.44 /tmp/localai/backend_data/backend-assets/grpc/whisper --addr 127.0.0.1:43485
472 root 20 0 5067556 76780 52172 S 0.0 0.1 0:02.96 /tmp/localai/backend_data/backend-assets/grpc/stablediffusion --addr 127.0.0.1:38627
508 root 20 0 1703892 29476 18736 S 0.0 0.0 0:00.23 /tmp/localai/backend_data/backend-assets/grpc/huggingface --addr 127.0.0.1:36347
515 root 20 0 1708320 31488 20384 S 0.0 0.0 0:00.25 /tmp/localai/backend_data/backend-assets/grpc/bert-embeddings --addr 127.0.0.1:33783
529 root 20 0 7392 3392 2788 R 0.0 0.0 0:00.38 top
Relevant Logs when attempting to load the model
12:09PM INF [/build/backend/python/sentencetransformers/run.sh] Attempting to load
12:09PM INF Loading model 'sentence-transformers/distiluse-base-multilingual-cased-v1' with backend /build/backend/python/sentencetransformers/run.sh
12:09PM DBG Loading model in memory from file: /build/models/sentence-transformers/distiluse-base-multilingual-cased-v1
12:09PM DBG Loading Model sentence-transformers/distiluse-base-multilingual-cased-v1 with gRPC (file: /build/models/sentence-transformers/distiluse-base-multilingual-cased-v1) (backend: /build/backend/python/sentencetransformers/run.sh): {backendString:/build/backend/python/sentencetransformers/run.sh model:sentence-transformers/distiluse-base-multilingual-cased-v1 threads:12 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc0008d2248 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama:/build/backend/python/exllama/run.sh exllama2:/build/backend/python/exllama2/run.sh huggingface-embeddings:/build/backend/python/sentencetransformers/run.sh mamba:/build/backend/python/mamba/run.sh openvoice:/build/backend/python/openvoice/run.sh parler-tts:/build/backend/python/parler-tts/run.sh petals:/build/backend/python/petals/run.sh rerankers:/build/backend/python/rerankers/run.sh sentencetransformers:/build/backend/python/sentencetransformers/run.sh transformers:/build/backend/python/transformers/run.sh transformers-musicgen:/build/backend/python/transformers-musicgen/run.sh vall-e-x:/build/backend/python/vall-e-x/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false parallelRequests:false}
12:09PM INF [/build/backend/python/sentencetransformers/run.sh] Fails: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/sentencetransformers/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS
[...]
12:09PM ERR Server error error="could not load model - all backends returned error: [llama-cpp]: could not load model: rpc error: code = Canceled desc = \n[llama-cpp]: could not load model: rpc error: code = Canceled desc = \n[llama-ggml]: could not load model: rpc error: code = Unknown desc = failed loading model\n[gpt4all]: could not load model: rpc error: code = Unknown desc = failed loading model\n[llama-cpp-fallback]: could not load model: rpc error: code = Canceled desc = \n[piper]: could not load model: rpc error: code = Unknown desc = unsupported model type /build/models/sentence-transformers/distiluse-base-multilingual-cased-v1 (should end with .onnx)\n[whisper]: could not load model: rpc error: code = Unknown desc = stat /build/models/sentence-transformers/distiluse-base-multilingual-cased-v1: no such file or directory\n[stablediffusion]: could not load model: rpc error: code = Unknown desc = stat /build/models/sentence-transformers/distiluse-base-multilingual-cased-v1: no such file or directory\n[rwkv]: could not load model: rpc error: code = Unavailable desc = error reading from server: EOF\n[huggingface]: could not load model: rpc error: code = Unknown desc = no huggingface token provided\n[bert-embeddings]: could not load model: rpc error: code = Unknown desc = failed loading model\n[/build/backend/python/parler-tts/run.sh]: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/parler-tts/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS\n[/build/backend/python/mamba/run.sh]: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/mamba/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS\n[/build/backend/python/coqui/run.sh]: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/coqui/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS\n[/build/backend/python/bark/run.sh]: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/bark/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS\n[/build/backend/python/exllama2/run.sh]: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/exllama2/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS\n[/build/backend/python/diffusers/run.sh]: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/diffusers/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS\n[/build/backend/python/autogptq/run.sh]: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/autogptq/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS\n[/build/backend/python/openvoice/run.sh]: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/openvoice/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS\n[/build/backend/python/sentencetransformers/run.sh]: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/sentencetransformers/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS\n[/build/backend/python/vall-e-x/run.sh]: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/vall-e-x/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS\n[/build/backend/python/transformers-musicgen/run.sh]: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/transformers-musicgen/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS\n[/build/backend/python/exllama/run.sh]: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/exllama/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS\n[/build/backend/python/transformers/run.sh]: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/transformers/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS\n[/build/backend/python/petals/run.sh]: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/petals/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS\n[/build/backend/python/vllm/run.sh]: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/vllm/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS\n[/build/backend/python/sentencetransformers/run.sh]: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/sentencetransformers/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS\n[/build/backend/python/rerankers/run.sh]: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/rerankers/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS" ip=10.0.0.13 latency=28.667421326s method=POST status=500 url=/embeddings
After manually placing the model under /build/models
12:32PM DBG Request received: {"model":"distiluse-base-multilingual-cased-v1","language":"","translate":false,"n":0,"top_p":null,"top_k":null,"temperature":null,"max_tokens":null,"echo":false,"batch":0,"ignore_eos":false,"repeat_penalty":0,"repeat_last_n":0,"n_keep":0,"frequency_penalty":0,"presence_penalty":0,"tfz":null,"typical_p":null,"seed":null,"negative_prompt":"","rope_freq_base":0,"rope_freq_scale":0,"negative_prompt_scale":0,"use_fast_tokenizer":false,"clip_skip":0,"tokenizer":"","file":"","size":"","prompt":null,"instruction":"","input":"ping","stop":null,"messages":null,"functions":null,"function_call":null,"stream":false,"mode":0,"step":0,"grammar":"","grammar_json_functions":null,"backend":"","model_base_name":""}
12:32PM DBG guessDefaultsFromFile: not a GGUF file
12:32PM DBG Parameter Config: &{PredictionOptions:{Model:distiluse-base-multilingual-cased-v1 Language: Translate:false N:0 TopP:0xc001456d80 TopK:0xc001456d88 Temperature:0xc001456d90 Maxtokens:0xc001456dc0 Echo:false Batch:0 IgnoreEOS:false RepeatPenalty:0 RepeatLastN:0 Keep:0 FrequencyPenalty:0 PresencePenalty:0 TFZ:0xc001456db8 TypicalP:0xc001456db0 Seed:0xc001456dd8 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name: F16:0xc001456d78 Threads:0xc001456d70 Debug:0xc001456dd0 Roles:map[] Embeddings:0xc001456dd1 Backend: TemplateConfig:{Chat: ChatMessage: Completion: Edit: Functions: UseTokenizerTemplate:false JoinChatMessagesByCharacter:<nil>} PromptStrings:[] InputStrings:[ping] InputToken:[] functionCallString: functionCallNameString: ResponseFormat: ResponseFormatMap:map[] FunctionsConfig:{DisableNoAction:false GrammarConfig:{ParallelCalls:false DisableParallelNewLines:false MixedMode:false NoMixedFreeString:false NoGrammar:false Prefix: ExpectStringsAfterJSON:false PropOrder:} NoActionFunctionName: NoActionDescriptionName: ResponseRegex:[] JSONRegexMatch:[] ReplaceFunctionResults:[] ReplaceLLMResult:[] CaptureLLMResult:[] FunctionNameKey: FunctionArgumentsKey:} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0xc001456da8 MirostatTAU:0xc001456da0 Mirostat:0xc001456d98 NGPULayers:0xc001456dc8 MMap:0xc001456dd0 MMlock:0xc001456dd1 LowVRAM:0xc001456dd1 Grammar: StopWords:[] Cutstrings:[] TrimSpace:[] TrimSuffix:[] ContextSize:0xc001456d68 NUMA:false LoraAdapter: LoraBase: LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 TensorParallelSize:0 MMProj: FlashAttention:false NoKVOffloading:false RopeScaling: ModelType: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{CUDA:false PipelineType: SchedulerType: EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder: ControlNet:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} TTSConfig:{Voice: VallE:{AudioPath:}} CUDA:false DownloadFiles:[] Description: Usage:}
12:32PM DBG Model already loaded in memory: distiluse-base-multilingual-cased-v1
12:32PM DBG Model 'distiluse-base-multilingual-cased-v1' already loaded
12:32PM ERR Server error error="rpc error: code = Unknown desc = unimplemented" ip=10.0.0.13 latency=5.563407ms method=POST status=500 url=/embeddings