dalai
dalai copied to clipboard
llama_model_load: unknown tensor '' in model file
Server running on http://localhost:3000/
> query: { method: 'installed' }
modelsPath C:\Users\VTSTech\dalai\alpaca\models
{ modelFolders: [] }
modelsPath C:\Users\VTSTech\dalai\llama\models
{ modelFolders: [ '7B' ] }
exists 7B
> query: {
seed: -1,
threads: 4,
n_predict: '400',
top_k: 40,
top_p: 0.9,
temp: 0.1,
repeat_last_n: 64,
repeat_penalty: 1.3,
debug: true,
models: [ 'llama.7B' ],
model: 'llama.7B',
prompt: 'Hi,',
id: 'TS-1679172386113-93764'
}
{ Core: 'llama', Model: '7B' }
exec: C:\Users\VTSTech\dalai\llama\build\Release\llama --seed -1 --threads 4 --n_predict 400 --model models/7B/ggml-model-q4_0.bin --top_k 40 --top_p 0.9 --temp 0.1 --repeat_last_n 64 --repeat_penalty 1.3 -p "Hi," in C:\Users\VTSTech\dalai\llama
[2J[m[HWindows PowerShell
Copyright (C) Microsoft Corporation. All rights reserved.
Try the new cross-platform PowerShell https://aka.ms/pscore6]0;C:\WINDOWS\System32\WindowsPowerShell\v1.0\powershell.exe[?25h[25l[6;1H[?25hPS C:\Users\VTSTech\dalai\llama> [?25l[93mC[?25h[m[?25l
PS C:\Users\VTSTech\dalai\llama[91m> [93mC:\Users\VTSTech\dalai\llama\build\Release\llama[m [90m--seed[m -1 [90m--threads[m [97m4[m [90m--n_predict[m [97m400[m [90m--model[m models/7B/ggml-model-q4_0.bin [90m--top_k[m [97m40[m [90m--top_p[m [97m0.9[m [90m--temp[m [97m0.1[m [90m--repeat
_last_n[m [97m64[m [90m--repeat_penalty[m [97m1.3[m [90m-p[m [36m"Hi[m[K[162C[6;1HPS C:\Users\VTSTech\dalai\llama> [93mC:\Users\VTSTech\dalai\llama\build\Release\llama[m [90m--seed[m -1 [90m--threads[m [97m4[m [90m--n_predict[m [97m400[m [90m--model[m models/7B/ggml-model-q4_0.bin [90m--top_k[m [97m40[m [90m--top_p[m [97m0.9[m [90m--temp[m [97m0.1[m [90m--repeat
_last_n[m [97m64[m [90m--repeat_penalty[m [97m1.3[m [90m-p[m [36m"Hi,"[m[K
[?25hmain: seed = 1679172387
llama_model_load: loading model from 'models/7B/ggml-model-q4_0.bin' - please wait ...
llama_model_load: n_vocab = 32000[53X
llama_model_load: n_ctx = 512[55X
llama_model_load: n_embd = 4096[54X
llama_model_load: n_mult = 256[55X
llama_model_load: n_head = 32[56X
llama_model_load: n_layer = 32[56X
llama_model_load: n_rot = 128[55X
llama_model_load: f16 = 2[57X
llama_model_load: n_ff = 11008[53X
llama_model_load: n_parts = 1[57X
llama_model_load: ggml ctx size = 4529.34 MB
llama_model_load: memory_size = 512.00 MB, n_mem = 16384[19X
llama_model_load: loading model part 1/1 from 'models/7B/ggml-model-q4_0.bin'
llama_model_load: llama_model_load: unknown tensor '' in model file[10X
main: failed to load model from 'models/7B/ggml-model-q4_0.bin'[14X
PS C:\Users\VTSTech\dalai\llama> [92mexit
Just to report a very similar issue using llama:
% ./main -m /Volumes/easystore/LLaMA/13B/ggml-model-q4_0.bin -t 8 -n 128 -p 'PROMPT HERE XXXX '
main: seed = 1679628160
llama_model_load: loading model from '/Volumes/easystore/LLaMA/13B/ggml-model-q4_0.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx = 512
llama_model_load: n_embd = 5120
llama_model_load: n_mult = 256
llama_model_load: n_head = 40
llama_model_load: n_layer = 40
llama_model_load: n_rot = 128
llama_model_load: f16 = 2
llama_model_load: n_ff = 13824
llama_model_load: n_parts = 2
llama_model_load: ggml ctx size = 8559.49 MB
llama_model_load: memory_size = 800.00 MB, n_mem = 20480
llama_model_load: loading model part 1/2 from '/Volumes/easystore/LLaMA/13B/ggml-model-q4_0.bin'
llama_model_load: ............................................. done
llama_model_load: model size = 3880.49 MB / num tensors = 363
llama_model_load: loading model part 2/2 from '/Volumes/easystore/LLaMA/13B/ggml-model-q4_0.bin.1'
llama_model_load: llama_model_load: unknown tensor '' in model file
llama_init_from_file: failed to load model
main: error: failed to load model '/Volumes/easystore/LLaMA/13B/ggml-model-q4_0.bin'
I had the same issue. The error message highlight the fact that a second quantize binary is missing for 13B model : "ggml-model-q4_0.bin.1"
This is due to model size. You can observe 2 binary for f16 files :
- ggml-model-f16.bin
- ggml-model-f16.bin.1
I think same issue will happen with both bigger models.
You need to launch twice command to quantize the f16 model (add ".1") :
./quantize ./models/13B/ggml-model-f16.bin ./models/13B/ggml-model-q4_0.bin 2
./quantize ./models/13B/ggml-model-f16.bin.1 ./models/13B/ggml-model-q4_0.bin.1 2