LocalAI
LocalAI copied to clipboard
Using "falcon-ggml" / "falcon" backend for Falcon model leads to falcon_model_load: invalid model file(bad magic) error
LocalAI version:
commit 8034ed3473fb1c8c6f5e3864933c442b377be52e (HEAD -> master, origin/master, origin/HEAD)
Author: Jesús Espino <[email protected]>
Date: Sun Oct 15 09:17:41 2023 +0200
Environment, CPU architecture, OS, and Version:
- MacOS Ventura 13.5.2 (22G91)
- Apple Silicon M2
Describe the bug 500 Error when trying to load model.
11:41AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:51272): stderr falcon_model_load: invalid model file '/Users/andrey/sandbox/local_ai/current/models/gpt-3.5-turbo' (bad magic)
11:41AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:51272): stderr falcon_bootstrap: failed to load model from '/Users/andrey/sandbox/local_ai/current/models/gpt-3.5-turbo'
[127.0.0.1]:51271 500 - POST /v1/chat/completions
To Reproduce
- Download model https://huggingface.co/hadongz/falcon-7b-instruct-gguf/blob/main/falcon-7b-instruct-q4_0.gguf
- Save it to ./models/gtp-3.5-turbo (Just for example, because I use MacMind client)
- Add file ./gpt-3.5-turbo.tmpl with this content:
You are an intelligent chatbot. Help the following question with brilliant answers.
Question: {{.Input}}
Answer:
- Add file gpt-3.5-turbo.yaml with this content:
backend: falcon-ggml
context_size: 2000
f16: true
gpu_layers: 1
name: gpt-3.5-turbo
parameters:
model: gpt-3.5-turbo
temperature: 0.9
top_k: 40
top_p: 0.65
- Build using official localai docs for Apple Silicon
- Start localAi with this command:
./local-ai --debug
- Run request with curl:
(base) andrey@m2 current % curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "gpt-3.5-turbo",
"messages": [{"role": "user", "content": "What is Abu-Dhabi?"}],
"temperature": 0.9
}'
{"created":1697527790,"object":"chat.completion","id":"9587206d-0939-4b40-8f5c-1a0695db9a5c","model":"gpt-3.5-turbo","choices":[{"index":0,"finish_reason":"stop","message":{"role":"assistant","content":" As an intelligent chatbot, I don't have a physical location, but Abu Dhabi is a city in the United Arab Emirates known for its luxurious lifestyle, beautiful beaches, and modern architecture.\u003c|endoftext|\u003e"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}%
- Got errors:
11:41AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:51272): stderr falcon_model_load: invalid model file '/Users/andrey/sandbox/local_ai/current/models/gpt-3.5-turbo' (bad magic)
11:41AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:51272): stderr falcon_bootstrap: failed to load model from '/Users/andrey/sandbox/local_ai/current/models/gpt-3.5-turbo'
[127.0.0.1]:51271 500 - POST /v1/chat/completions
Expected behavior
- Some response.
Logs
(base) andrey@m2 current % ./local-ai --debug
11:41AM DBG no galleries to load
11:41AM INF Starting LocalAI using 4 threads, with models path: /Users/andrey/sandbox/local_ai/current/models
11:41AM INF LocalAI version: v1.30.0-28-g8034ed3 (8034ed3473fb1c8c6f5e3864933c442b377be52e)
11:41AM DBG Model: gpt-3.5-turbo (config: {PredictionOptions:{Model:gpt-3.5-turbo Language: N:0 TopP:0.65 TopK:40 Temperature:0.9 Maxtokens:0 Echo:false Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 TypicalP:0 Seed:0 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:gpt-3.5-turbo F16:true Threads:0 Debug:false Roles:map[] Embeddings:false Backend:falcon-ggml TemplateConfig:{Chat: ChatMessage: Completion: Edit: Functions:} PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName:} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:1 MMap:false MMlock:false LowVRAM:false Grammar: StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:2000 NUMA:false LoraAdapter: LoraBase: NoMulMatQ:false DraftModel: NDraft:0 Quantization:} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{PipelineType: SchedulerType: CUDA:false EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} VallE:{AudioPath:}})
11:41AM DBG Extracting backend assets files to /tmp/localai/backend_data
┌───────────────────────────────────────────────────┐
│ Fiber v2.49.2 │
│ http://127.0.0.1:8080 │
│ (bound on host 0.0.0.0 and port 8080) │
│ │
│ Handlers ............ 71 Processes ........... 1 │
│ Prefork ....... Disabled PID .............. 2836 │
└───────────────────────────────────────────────────┘
11:41AM DBG Request received:
11:41AM DBG Configuration read: &{PredictionOptions:{Model:gpt-3.5-turbo Language: N:0 TopP:0.65 TopK:40 Temperature:0.9 Maxtokens:0 Echo:false Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 TypicalP:0 Seed:0 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:gpt-3.5-turbo F16:true Threads:4 Debug:true Roles:map[] Embeddings:false Backend:falcon-ggml TemplateConfig:{Chat: ChatMessage: Completion: Edit: Functions:} PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName:} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:1 MMap:false MMlock:false LowVRAM:false Grammar: StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:2000 NUMA:false LoraAdapter: LoraBase: NoMulMatQ:false DraftModel: NDraft:0 Quantization:} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{PipelineType: SchedulerType: CUDA:false EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} VallE:{AudioPath:}}
11:41AM DBG Parameters: &{PredictionOptions:{Model:gpt-3.5-turbo Language: N:0 TopP:0.65 TopK:40 Temperature:0.9 Maxtokens:0 Echo:false Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 TypicalP:0 Seed:0 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:gpt-3.5-turbo F16:true Threads:4 Debug:true Roles:map[] Embeddings:false Backend:falcon-ggml TemplateConfig:{Chat: ChatMessage: Completion: Edit: Functions:} PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName:} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:1 MMap:false MMlock:false LowVRAM:false Grammar: StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:2000 NUMA:false LoraAdapter: LoraBase: NoMulMatQ:false DraftModel: NDraft:0 Quantization:} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{PipelineType: SchedulerType: CUDA:false EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} VallE:{AudioPath:}}
11:41AM DBG Prompt (before templating): What is Abu-Dhabi?
11:41AM DBG Template found, input modified to: You are an intelligent chatbot "Esenia". Help the following question with brilliant answers.
Question: What is Abu-Dhabi?
Answer:
11:41AM DBG Prompt (after templating): You are an intelligent chatbot "Esenia". Help the following question with brilliant answers.
Question: What is Abu-Dhabi?
Answer:
11:41AM DBG Loading model falcon-ggml from gpt-3.5-turbo
11:41AM DBG Loading model in memory from file: /Users/andrey/sandbox/local_ai/current/models/gpt-3.5-turbo
11:41AM DBG Loading GRPC Model falcon-ggml: {backendString:falcon-ggml model:gpt-3.5-turbo threads:4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0x140001029c0 externalBackends:map[] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false}
11:41AM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/falcon-ggml
11:41AM DBG GRPC Service for gpt-3.5-turbo will be running at: '127.0.0.1:51272'
11:41AM DBG GRPC Service state dir: /var/folders/f9/1b1jz83s4ysfn9zfncbsb8y40000gn/T/go-processmanager2128385065
11:41AM DBG GRPC Service Started
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:51272: connect: connection refused"
11:41AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:51272): stderr 2023/10/17 11:41:38 gRPC Server listening at 127.0.0.1:51272
11:41AM DBG GRPC Service Ready
11:41AM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:<nil>} sizeCache:0 unknownFields:[] Model:gpt-3.5-turbo ContextSize:2000 Seed:0 NBatch:512 F16Memory:true MLock:false MMap:false VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:1 MainGPU: TensorSplit: Threads:4 LibrarySearchPath: RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/Users/andrey/sandbox/local_ai/current/models/gpt-3.5-turbo Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 Tokenizer: LoraBase: LoraAdapter: NoMulMatQ:false DraftModel: AudioPath: Quantization:}
11:41AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:51272): stderr falcon_model_load: invalid model file '/Users/andrey/sandbox/local_ai/current/models/gpt-3.5-turbo' (bad magic)
11:41AM DBG GRPC(gpt-3.5-turbo-127.0.0.1:51272): stderr falcon_bootstrap: failed to load model from '/Users/andrey/sandbox/local_ai/current/models/gpt-3.5-turbo'
[127.0.0.1]:51271 500 - POST /v1/chat/completions
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days.
This issue was closed because it has been stalled for 5 days with no activity.