dalai llama_model_load: unknown tensor '' in model file

Server running on http://localhost:3000/
> query: { method: 'installed' }
modelsPath C:\Users\VTSTech\dalai\alpaca\models
{ modelFolders: [] }
modelsPath C:\Users\VTSTech\dalai\llama\models
{ modelFolders: [ '7B' ] }
exists 7B
> query: {
  seed: -1,
  threads: 4,
  n_predict: '400',
  top_k: 40,
  top_p: 0.9,
  temp: 0.1,
  repeat_last_n: 64,
  repeat_penalty: 1.3,
  debug: true,
  models: [ 'llama.7B' ],
  model: 'llama.7B',
  prompt: 'Hi,',
  id: 'TS-1679172386113-93764'
}
{ Core: 'llama', Model: '7B' }
exec: C:\Users\VTSTech\dalai\llama\build\Release\llama --seed -1 --threads 4 --n_predict 400 --model models/7B/ggml-model-q4_0.bin --top_k 40 --top_p 0.9 --temp 0.1 --repeat_last_n 64 --repeat_penalty 1.3 -p "Hi," in C:\Users\VTSTech\dalai\llama

[2J[m[HWindows PowerShell

Copyright (C) Microsoft Corporation. All rights reserved.



Try the new cross-platform PowerShell https://aka.ms/pscore6]0;C:\WINDOWS\System32\WindowsPowerShell\v1.0\powershell.exe[?25h[25l[6;1H[?25hPS C:\Users\VTSTech\dalai\llama> [?25l[93mC[?25h[m[?25l
PS C:\Users\VTSTech\dalai\llama[91m> [93mC:\Users\VTSTech\dalai\llama\build\Release\llama[m [90m--seed[m -1 [90m--threads[m [97m4[m [90m--n_predict[m [97m400[m [90m--model[m models/7B/ggml-model-q4_0.bin [90m--top_k[m [97m40[m [90m--top_p[m [97m0.9[m [90m--temp[m [97m0.1[m [90m--repeat

_last_n[m [97m64[m [90m--repeat_penalty[m [97m1.3[m [90m-p[m [36m"Hi[m[K[162C[6;1HPS C:\Users\VTSTech\dalai\llama> [93mC:\Users\VTSTech\dalai\llama\build\Release\llama[m [90m--seed[m -1 [90m--threads[m [97m4[m [90m--n_predict[m [97m400[m [90m--model[m models/7B/ggml-model-q4_0.bin [90m--top_k[m [97m40[m [90m--top_p[m [97m0.9[m [90m--temp[m [97m0.1[m [90m--repeat

_last_n[m [97m64[m [90m--repeat_penalty[m [97m1.3[m [90m-p[m [36m"Hi,"[m[K

[?25hmain: seed = 1679172387

llama_model_load: loading model from 'models/7B/ggml-model-q4_0.bin' - please wait ...

llama_model_load: n_vocab = 32000[53X

llama_model_load: n_ctx   = 512[55X

llama_model_load: n_embd  = 4096[54X

llama_model_load: n_mult  = 256[55X

llama_model_load: n_head  = 32[56X

llama_model_load: n_layer = 32[56X

llama_model_load: n_rot   = 128[55X

llama_model_load: f16     = 2[57X

llama_model_load: n_ff    = 11008[53X

llama_model_load: n_parts = 1[57X

llama_model_load: ggml ctx size = 4529.34 MB

llama_model_load: memory_size =   512.00 MB, n_mem = 16384[19X

llama_model_load: loading model part 1/1 from 'models/7B/ggml-model-q4_0.bin'

llama_model_load: llama_model_load: unknown tensor '' in model file[10X

main: failed to load model from 'models/7B/ggml-model-q4_0.bin'[14X

PS C:\Users\VTSTech\dalai\llama> [92mexit

Mar 18 '23 20:03 VTSTech

Just to report a very similar issue using llama:

% ./main -m /Volumes/easystore/LLaMA/13B/ggml-model-q4_0.bin -t 8 -n 128 -p 'PROMPT HERE XXXX '
main: seed = 1679628160
llama_model_load: loading model from '/Volumes/easystore/LLaMA/13B/ggml-model-q4_0.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 5120
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 40
llama_model_load: n_layer = 40
llama_model_load: n_rot   = 128
llama_model_load: f16     = 2
llama_model_load: n_ff    = 13824
llama_model_load: n_parts = 2
llama_model_load: ggml ctx size = 8559.49 MB
llama_model_load: memory_size =   800.00 MB, n_mem = 20480
llama_model_load: loading model part 1/2 from '/Volumes/easystore/LLaMA/13B/ggml-model-q4_0.bin'
llama_model_load: ............................................. done
llama_model_load: model size =  3880.49 MB / num tensors = 363
llama_model_load: loading model part 2/2 from '/Volumes/easystore/LLaMA/13B/ggml-model-q4_0.bin.1'
llama_model_load: llama_model_load: unknown tensor '' in model file
llama_init_from_file: failed to load model
main: error: failed to load model '/Volumes/easystore/LLaMA/13B/ggml-model-q4_0.bin'

Mar 24 '23 03:03 sbassi

I had the same issue. The error message highlight the fact that a second quantize binary is missing for 13B model : "ggml-model-q4_0.bin.1"

This is due to model size. You can observe 2 binary for f16 files :

ggml-model-f16.bin
ggml-model-f16.bin.1

I think same issue will happen with both bigger models.

You need to launch twice command to quantize the f16 model (add ".1") :

./quantize ./models/13B/ggml-model-f16.bin ./models/13B/ggml-model-q4_0.bin 2

./quantize ./models/13B/ggml-model-f16.bin.1 ./models/13B/ggml-model-q4_0.bin.1 2

Apr 04 '23 09:04 NomahS

dalai dalai copied to clipboard

llama_model_load: unknown tensor '' in model file

dalai
dalai copied to clipboard