LocalAI icon indicating copy to clipboard operation
LocalAI copied to clipboard

localai:v2.10.1-cublas-cuda12-core ”Fails: grpc process not found”

Open testKKP opened this issue 11 months ago • 8 comments

LocalAI version:

localai:v2.10.1-cublas-cuda12-core Environment, CPU architecture, OS, and Version: windows 11,RTX 4090,cuda 12.3

2024-03-21 19:23:56 11:23AM INF [/build/backend/python/sentencetransformers/run.sh] Fails: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/sentencetransformers/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS 2024-03-21 19:23:56 11:23AM INF [/build/backend/python/coqui/run.sh] Fails: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/coqui/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS Describe the bug

i just run the command "docker run -ti -p 8080:8080 --gpus all localai/localai:v2.10.1-cublas-cuda12-core mixtral-instruct",and send a test prompt to the 8080 port.then i return the error to me. To Reproduce

the command i send is "curl http://localhost:8080/v1/completions -H "Content-Type: application/json" -d "{ "model": "mixtral-instruct", "prompt": "How are you doing?" }"" Expected behavior

Logs

Additional context

testKKP avatar Mar 21 '24 11:03 testKKP

Hi @testKKP can you post your model definition? The core images does not include python so SentenceTransformers and Coqui are not available. For your reference backends that requires python are:

  • autogptq
  • bark
  • coqui
  • diffusers
  • exllama
  • exllama2
  • mamba
  • petals
  • sentencetransformers
  • transformers
  • transformers-musicgen
  • vall-e-x
  • vllm

fakezeta avatar Mar 27 '24 11:03 fakezeta

Same issue here, using the new all in one image

 docker run -p 8080:8080 --gpus all --name local-ai -ti quay.io/go-skynet/local-ai:latest-aio-gpu-nvidia-cuda-12

Request:

{ "model": "gpt-4", "messages": [{"role": "user",     
  "content": "How are you doing?", "temperature": 0.1}]}

Response:

{
	"error": {
		"code": 500,
		"message": "rpc error: code = Unknown desc = unimplemented",
		"type": ""
	}
}

Running Fedora Linux + Nvidia RTX3060

ThaDrone avatar Mar 27 '24 12:03 ThaDrone

Hi @ThaDrone this is a different issue since gpt-4 from AIO images uses llama.cpp backend while it seems that @testKKP is using a python (transformer?) backend. Can you open a new issue with the output from docker logs local-ai?

fakezeta avatar Mar 27 '24 12:03 fakezeta

Hi @ThaDrone this is a different issue since gpt-4 from AIO images uses llama.cpp backend while it seems that @testKKP is using a python (transformer?) backend. Can you open a new issue with the output from docker logs local-ai?

thansk for you reply.in fact ,i dont know my model definition. i just run the command provided in the official website like this “docker run -ti -p 8080:8080 --gpus all localai/localai:v2.10.1-cublas-cuda12-core mixtral-instruct”. the others were all done by the docker itself.so i dont know how to solve it.

testKKP avatar Mar 31 '24 02:03 testKKP

There is mixtral configuration in the example directory: https://github.com/mudler/LocalAI/tree/master/examples/configurations/mixtral. Download the files in your models directory with the GGUF file. In the example is used mixtral-8x7b-instruct-v0.1.Q2_K.gguf: you can choose whatever you prefer just remember to update the config.

fakezeta avatar Mar 31 '24 21:03 fakezeta

As a bot (experiment of @mudler), I would like to provide some guidelines to resolve the issue:

  1. Make sure Docker is properly installed and running on your system.
  2. Download the mixtral-instruct configuration files from the relevant GGUF link. You can choose any available GGUF file for mixtral-instruct.
  3. Save the downloaded files in the models directory of your LocalAI installation.
  4. Stop the currently running Docker container if it's still running. You can use the command: docker stop local-ai
  5. Run the following command, replacing [gguf_file] with the name of the file you downloaded in step 2:
docker run -p 8080:8080 --gpus all localai/localai:v2.10.1-cublas-cuda12-core mixtral-instruct-[gguf_file]

Please replace [gguf_file] in the command with the actual filename you downloaded in step 3. If you followed the instructions in the example, the command should look like this:

docker run -p 8080:8080 --gpus all localai/localai:v2.10.1-cublas-cuda12-core mixtral-8x7b-instruct-v0.1.Q2_K.gguf

If you need further assistance, please provide the output of the command and any additional logs while running LocalAI in debug mode (--debug or DEBUG=true).

localai-bot avatar Apr 15 '24 17:04 localai-bot

Same issue here, using the new all in one image

 docker run -p 8080:8080 --gpus all --name local-ai -ti quay.io/go-skynet/local-ai:latest-aio-gpu-nvidia-cuda-12

Request:

{ "model": "gpt-4", "messages": [{"role": "user",     
  "content": "How are you doing?", "temperature": 0.1}]}

Response:

{
	"error": {
		"code": 500,
		"message": "rpc error: code = Unknown desc = unimplemented",
		"type": ""
	}
}

Running Fedora Linux + Nvidia RTX3060

Hello, I also had a similar problem, did you solve it?

qingfenghcy avatar Apr 22 '24 07:04 qingfenghcy

I am trying to debug the original error described in the OP's comment, since I am facing the same error running localai with docker locally. Unlike the OP, I am running the aio image localai/localai:latest-aio-gpu-nvidia-cuda-12.

Here is what I found:

  1. Attempting to hit the text-embedding-ada-002 embeddings works fine, and loads the MiniLM model.
  2. Attempting to ask for sentence-transformer model, e.g. sentence-transformers/distiluse-base-multilingual-cased-v1 results in an error (full logs below)

References

Running top inside the container

     62 root      20   0   15.4g 924956 344376 S   0.3   0.9   0:20.87 python /build/backend/python/sentencetransformers/backend.py --addr 127.0.0.1:36433
    408 root      20   0 7584852 246416 157428 S   0.3   0.3   0:00.74 /tmp/localai/backend_data/backend-assets/grpc/llama-ggml --addr 127.0.0.1:46483
    424 root      20   0 2600784 128992  61288 S   0.3   0.1   0:02.40 /tmp/localai/backend_data/backend-assets/grpc/llama-cpp-fallback --addr 127.0.0.1:37325
      1 root      20   0 6444992   1.8g   1.8g S   0.0   1.9   0:14.64 ./local-ai
    257 root      20   0    4628   3856   3252 S   0.0   0.0   0:00.04 bash
    360 root      20   0 2600784 128684  60972 S   0.0   0.1   0:02.43 /tmp/localai/backend_data/backend-assets/grpc/llama-cpp-avx2 --addr 127.0.0.1:42339
    384 root      20   0 2600784 128728  61012 S   0.0   0.1   0:02.40 /tmp/localai/backend_data/backend-assets/grpc/llama-cpp-avx2 --addr 127.0.0.1:45619
    417 root      20   0 1707144  31364  20684 S   0.0   0.0   0:00.22 /tmp/localai/backend_data/backend-assets/grpc/gpt4all --addr 127.0.0.1:46475
    458 root      20   0 1725624  34776  23276 S   0.0   0.0   0:00.23 /tmp/localai/backend_data/backend-assets/grpc/piper --addr 127.0.0.1:45617
    465 root      20   0 2378980 137008  62124 S   0.0   0.1   0:00.44 /tmp/localai/backend_data/backend-assets/grpc/whisper --addr 127.0.0.1:43485
    472 root      20   0 5067556  76780  52172 S   0.0   0.1   0:02.96 /tmp/localai/backend_data/backend-assets/grpc/stablediffusion --addr 127.0.0.1:38627
    508 root      20   0 1703892  29476  18736 S   0.0   0.0   0:00.23 /tmp/localai/backend_data/backend-assets/grpc/huggingface --addr 127.0.0.1:36347
    515 root      20   0 1708320  31488  20384 S   0.0   0.0   0:00.25 /tmp/localai/backend_data/backend-assets/grpc/bert-embeddings --addr 127.0.0.1:33783
    529 root      20   0    7392   3392   2788 R   0.0   0.0   0:00.38 top

Relevant Logs when attempting to load the model

12:09PM INF [/build/backend/python/sentencetransformers/run.sh] Attempting to load
12:09PM INF Loading model 'sentence-transformers/distiluse-base-multilingual-cased-v1' with backend /build/backend/python/sentencetransformers/run.sh
12:09PM DBG Loading model in memory from file: /build/models/sentence-transformers/distiluse-base-multilingual-cased-v1
12:09PM DBG Loading Model sentence-transformers/distiluse-base-multilingual-cased-v1 with gRPC (file: /build/models/sentence-transformers/distiluse-base-multilingual-cased-v1) (backend: /build/backend/python/sentencetransformers/run.sh): {backendString:/build/backend/python/sentencetransformers/run.sh model:sentence-transformers/distiluse-base-multilingual-cased-v1 threads:12 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc0008d2248 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama:/build/backend/python/exllama/run.sh exllama2:/build/backend/python/exllama2/run.sh huggingface-embeddings:/build/backend/python/sentencetransformers/run.sh mamba:/build/backend/python/mamba/run.sh openvoice:/build/backend/python/openvoice/run.sh parler-tts:/build/backend/python/parler-tts/run.sh petals:/build/backend/python/petals/run.sh rerankers:/build/backend/python/rerankers/run.sh sentencetransformers:/build/backend/python/sentencetransformers/run.sh transformers:/build/backend/python/transformers/run.sh transformers-musicgen:/build/backend/python/transformers-musicgen/run.sh vall-e-x:/build/backend/python/vall-e-x/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false parallelRequests:false}
12:09PM INF [/build/backend/python/sentencetransformers/run.sh] Fails: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/sentencetransformers/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS
[...]
12:09PM ERR Server error error="could not load model - all backends returned error: [llama-cpp]: could not load model: rpc error: code = Canceled desc = \n[llama-cpp]: could not load model: rpc error: code = Canceled desc = \n[llama-ggml]: could not load model: rpc error: code = Unknown desc = failed loading model\n[gpt4all]: could not load model: rpc error: code = Unknown desc = failed loading model\n[llama-cpp-fallback]: could not load model: rpc error: code = Canceled desc = \n[piper]: could not load model: rpc error: code = Unknown desc = unsupported model type /build/models/sentence-transformers/distiluse-base-multilingual-cased-v1 (should end with .onnx)\n[whisper]: could not load model: rpc error: code = Unknown desc = stat /build/models/sentence-transformers/distiluse-base-multilingual-cased-v1: no such file or directory\n[stablediffusion]: could not load model: rpc error: code = Unknown desc = stat /build/models/sentence-transformers/distiluse-base-multilingual-cased-v1: no such file or directory\n[rwkv]: could not load model: rpc error: code = Unavailable desc = error reading from server: EOF\n[huggingface]: could not load model: rpc error: code = Unknown desc = no huggingface token provided\n[bert-embeddings]: could not load model: rpc error: code = Unknown desc = failed loading model\n[/build/backend/python/parler-tts/run.sh]: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/parler-tts/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS\n[/build/backend/python/mamba/run.sh]: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/mamba/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS\n[/build/backend/python/coqui/run.sh]: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/coqui/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS\n[/build/backend/python/bark/run.sh]: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/bark/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS\n[/build/backend/python/exllama2/run.sh]: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/exllama2/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS\n[/build/backend/python/diffusers/run.sh]: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/diffusers/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS\n[/build/backend/python/autogptq/run.sh]: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/autogptq/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS\n[/build/backend/python/openvoice/run.sh]: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/openvoice/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS\n[/build/backend/python/sentencetransformers/run.sh]: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/sentencetransformers/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS\n[/build/backend/python/vall-e-x/run.sh]: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/vall-e-x/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS\n[/build/backend/python/transformers-musicgen/run.sh]: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/transformers-musicgen/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS\n[/build/backend/python/exllama/run.sh]: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/exllama/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS\n[/build/backend/python/transformers/run.sh]: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/transformers/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS\n[/build/backend/python/petals/run.sh]: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/petals/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS\n[/build/backend/python/vllm/run.sh]: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/vllm/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS\n[/build/backend/python/sentencetransformers/run.sh]: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/sentencetransformers/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS\n[/build/backend/python/rerankers/run.sh]: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/rerankers/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS" ip=10.0.0.13 latency=28.667421326s method=POST status=500 url=/embeddings

After manually placing the model under /build/models

12:32PM DBG Request received: {"model":"distiluse-base-multilingual-cased-v1","language":"","translate":false,"n":0,"top_p":null,"top_k":null,"temperature":null,"max_tokens":null,"echo":false,"batch":0,"ignore_eos":false,"repeat_penalty":0,"repeat_last_n":0,"n_keep":0,"frequency_penalty":0,"presence_penalty":0,"tfz":null,"typical_p":null,"seed":null,"negative_prompt":"","rope_freq_base":0,"rope_freq_scale":0,"negative_prompt_scale":0,"use_fast_tokenizer":false,"clip_skip":0,"tokenizer":"","file":"","size":"","prompt":null,"instruction":"","input":"ping","stop":null,"messages":null,"functions":null,"function_call":null,"stream":false,"mode":0,"step":0,"grammar":"","grammar_json_functions":null,"backend":"","model_base_name":""}
12:32PM DBG guessDefaultsFromFile: not a GGUF file
12:32PM DBG Parameter Config: &{PredictionOptions:{Model:distiluse-base-multilingual-cased-v1 Language: Translate:false N:0 TopP:0xc001456d80 TopK:0xc001456d88 Temperature:0xc001456d90 Maxtokens:0xc001456dc0 Echo:false Batch:0 IgnoreEOS:false RepeatPenalty:0 RepeatLastN:0 Keep:0 FrequencyPenalty:0 PresencePenalty:0 TFZ:0xc001456db8 TypicalP:0xc001456db0 Seed:0xc001456dd8 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name: F16:0xc001456d78 Threads:0xc001456d70 Debug:0xc001456dd0 Roles:map[] Embeddings:0xc001456dd1 Backend: TemplateConfig:{Chat: ChatMessage: Completion: Edit: Functions: UseTokenizerTemplate:false JoinChatMessagesByCharacter:<nil>} PromptStrings:[] InputStrings:[ping] InputToken:[] functionCallString: functionCallNameString: ResponseFormat: ResponseFormatMap:map[] FunctionsConfig:{DisableNoAction:false GrammarConfig:{ParallelCalls:false DisableParallelNewLines:false MixedMode:false NoMixedFreeString:false NoGrammar:false Prefix: ExpectStringsAfterJSON:false PropOrder:} NoActionFunctionName: NoActionDescriptionName: ResponseRegex:[] JSONRegexMatch:[] ReplaceFunctionResults:[] ReplaceLLMResult:[] CaptureLLMResult:[] FunctionNameKey: FunctionArgumentsKey:} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0xc001456da8 MirostatTAU:0xc001456da0 Mirostat:0xc001456d98 NGPULayers:0xc001456dc8 MMap:0xc001456dd0 MMlock:0xc001456dd1 LowVRAM:0xc001456dd1 Grammar: StopWords:[] Cutstrings:[] TrimSpace:[] TrimSuffix:[] ContextSize:0xc001456d68 NUMA:false LoraAdapter: LoraBase: LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 TensorParallelSize:0 MMProj: FlashAttention:false NoKVOffloading:false RopeScaling: ModelType: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{CUDA:false PipelineType: SchedulerType: EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder: ControlNet:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} TTSConfig:{Voice: VallE:{AudioPath:}} CUDA:false DownloadFiles:[] Description: Usage:}
12:32PM DBG Model already loaded in memory: distiluse-base-multilingual-cased-v1
12:32PM DBG Model 'distiluse-base-multilingual-cased-v1' already loaded
12:32PM ERR Server error error="rpc error: code = Unknown desc = unimplemented" ip=10.0.0.13 latency=5.563407ms method=POST status=500 url=/embeddings

mbbyn avatar Jul 23 '24 12:07 mbbyn