llama_cpp.rb Error when trying to run the chat.rb example.

Error when trying to run the chat.rb example.

Open mybuddyandrew opened this issue 1 year ago • 1 comments

When trying to run the chat example I am getting an error.

ruby chat.rb --model /playingwithai/models/llama-2-7b-chat.Q8_0.gguf llama_model_loader: loaded meta data with 19 key-value pairs and 291 tensors from /playingwithai/models/llama-2-7b-chat.Q8_0.gguf (version GGUF V2) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = llama llama_model_loader: - kv 1: general.name str = LLaMA v2 llama_model_loader: - kv 2: llama.context_length u32 = 4096 llama_model_loader: - kv 3: llama.embedding_length u32 = 4096 llama_model_loader: - kv 4: llama.block_count u32 = 32 llama_model_loader: - kv 5: llama.feed_forward_length u32 = 11008 llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 128 llama_model_loader: - kv 7: llama.attention.head_count u32 = 32 llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 32 llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000001 llama_model_loader: - kv 10: general.file_type u32 = 7 llama_model_loader: - kv 11: tokenizer.ggml.model str = llama llama_model_loader: - kv 12: tokenizer.ggml.tokens arr[str,32000] = ["", "~~", "~~", "<0x00>", "<... llama_model_loader: - kv 13: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000... llama_model_loader: - kv 14: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ... llama_model_loader: - kv 15: tokenizer.ggml.bos_token_id u32 = 1 llama_model_loader: - kv 16: tokenizer.ggml.eos_token_id u32 = 2 llama_model_loader: - kv 17: tokenizer.ggml.unknown_token_id u32 = 0 llama_model_loader: - kv 18: general.quantization_version u32 = 2 llama_model_loader: - type f32: 65 tensors llama_model_loader: - type q8_0: 226 tensors llm_load_vocab: special tokens definition check successful ( 259/32000 ). llm_load_print_meta: format = GGUF V2 llm_load_print_meta: arch = llama llm_load_print_meta: vocab type = SPM llm_load_print_meta: n_vocab = 32000 llm_load_print_meta: n_merges = 0 llm_load_print_meta: n_ctx_train = 4096 llm_load_print_meta: n_embd = 4096 llm_load_print_meta: n_head = 32 llm_load_print_meta: n_head_kv = 32 llm_load_print_meta: n_layer = 32 llm_load_print_meta: n_rot = 128 llm_load_print_meta: n_embd_head_k = 128 llm_load_print_meta: n_embd_head_v = 128 llm_load_print_meta: n_gqa = 1 llm_load_print_meta: n_embd_k_gqa = 4096 llm_load_print_meta: n_embd_v_gqa = 4096 llm_load_print_meta: f_norm_eps = 0.0e+00 llm_load_print_meta: f_norm_rms_eps = 1.0e-06 llm_load_print_meta: f_clamp_kqv = 0.0e+00 llm_load_print_meta: f_max_alibi_bias = 0.0e+00 llm_load_print_meta: n_ff = 11008 llm_load_print_meta: n_expert = 0 llm_load_print_meta: n_expert_used = 0 llm_load_print_meta: rope scaling = linear llm_load_print_meta: freq_base_train = 10000.0 llm_load_print_meta: freq_scale_train = 1 llm_load_print_meta: n_yarn_orig_ctx = 4096 llm_load_print_meta: rope_finetuned = unknown llm_load_print_meta: model type = 7B llm_load_print_meta: model ftype = Q8_0 llm_load_print_meta: model params = 6.74 B llm_load_print_meta: model size = 6.67 GiB (8.50 BPW) llm_load_print_meta: general.name = LLaMA v2 llm_load_print_meta: BOS token = 1 '~~' llm_load_print_meta: EOS token = 2 '~~' llm_load_print_meta: UNK token = 0 '' llm_load_print_meta: LF token = 13 '<0x0A>' llm_load_tensors: ggml ctx size = 0.11 MiB llm_load_tensors: system memory used = 6828.75 MiB .................................................................................................... llama_new_context_with_model: n_ctx = 512 llama_new_context_with_model: freq_base = 10000.0 llama_new_context_with_model: freq_scale = 1 llama_new_context_with_model: KV self size = 256.00 MiB, K (f16): 128.00 MiB, V (f16): 128.00 MiB llama_build_graph: non-view tensors processed: 676/676 llama_new_context_with_model: compute buffer total size = 73.69 MiB Transcript of a dialog, where the User interacts with an Assistant named Bob. Bob is helpful, kind, honest, good at writing, and never fails to answer the User's requests immediately and with precision.

User: Hello, Bob. Bob: Hello. How may I help you today? User: Please tell me the largest city in Europe. Bob: Sure. The largest city in Europe is Moscow, the capital of Russia. chat.rb:73:in block in main': undefined method get_one' for LLaMACpp::Batch:Class (NoMethodError)

      context.decode(LLaMACpp::Batch.get_one(tokens: embd[i...(i + n_eval)], n_tokens: n_eval, pos_zero: n_past, seq_id: 0))
                                    ^^^^^^^^
from chat.rb:71:in `step'
from chat.rb:71:in `main'
from /.asdf/installs/ruby/3.2.1/lib/ruby/gems/3.2.0/gems/thor-1.3.0/lib/thor/command.rb:28:in `run'
from /.asdf/installs/ruby/3.2.1/lib/ruby/gems/3.2.0/gems/thor-1.3.0/lib/thor/invocation.rb:127:in `invoke_command'
from /.asdf/installs/ruby/3.2.1/lib/ruby/gems/3.2.0/gems/thor-1.3.0/lib/thor.rb:527:in `dispatch'
from /.asdf/installs/ruby/3.2.1/lib/ruby/gems/3.2.0/gems/thor-1.3.0/lib/thor/base.rb:584:in `start'
from chat.rb:196:in `<main>'

I am happy to dig around. Im not really sure where to start. 😓

Jan 10 '24 02:01 mybuddyandrew

The example script in the main branch is under development, so it is better to run it with the code in the main branch:

$ git clone https://github.com/yoshoku/llama_cpp.rb.git
$ rake install
$ cd examples
$ bundle install
$ ruby chat.rb --model ...

Alternatively, there is a way to use the example script that matches the version of llama_cpp you have installed:

$ git clone https://github.com/yoshoku/llama_cpp.rb.git
$ git checkout -b example_script v0.11.1
$ cd examples
$ bundle install
$ ruby chat.rb --model ...

Jan 10 '24 12:01 yoshoku

llama_cpp.rb llama_cpp.rb copied to clipboard

Error when trying to run the chat.rb example.

llama_cpp.rb
llama_cpp.rb copied to clipboard