llama.cpp
llama.cpp copied to clipboard
docker运行server的时候异常退出
llama_model_loader: loaded meta data with 20 key-value pairs and 259 tensors from /models/qwen7b-chat-q4_0.gguf (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = qwen llama_model_loader: - kv 1: general.name str = Qwen llama_model_loader: - kv 2: qwen.context_length u32 = 8192 llama_model_loader: - kv 3: qwen.block_count u32 = 32 llama_model_loader: - kv 4: qwen.embedding_length u32 = 4096 llama_model_loader: - kv 5: qwen.feed_forward_length u32 = 22016 llama_model_loader: - kv 6: qwen.rope.freq_base f32 = 10000.000000 llama_model_loader: - kv 7: qwen.rope.dimension_count u32 = 128 llama_model_loader: - kv 8: qwen.attention.head_count u32 = 32 llama_model_loader: - kv 9: qwen.attention.layer_norm_rms_epsilon f32 = 0.000001 llama_model_loader: - kv 10: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 11: tokenizer.ggml.tokens arr[str,151936] = ["!", """, "#", "$", "%", "&", "'", ... llama_model_loader: - kv 12: tokenizer.ggml.token_type arr[i32,151936] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 13: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",... llama_model_loader: - kv 14: tokenizer.ggml.padding_token_id u32 = 151643 llama_model_loader: - kv 15: tokenizer.ggml.bos_token_id u32 = 151643 llama_model_loader: - kv 16: tokenizer.ggml.eos_token_id u32 = 151643 llama_model_loader: - kv 17: tokenizer.ggml.unknown_token_id u32 = 151643 llama_model_loader: - kv 18: general.quantization_version u32 = 2 llama_model_loader: - kv 19: general.file_type u32 = 2 llama_model_loader: - type f32: 97 tensors llama_model_loader: - type q4_0: 161 tensors llama_model_loader: - type q6_K: 1 tensors llm_load_vocab: special tokens definition check successful ( 293/151936 ). llm_load_print_meta: format = GGUF V3 (latest) llm_load_print_meta: arch = qwen llm_load_print_meta: vocab type = BPE llm_load_print_meta: n_vocab = 151936 llm_load_print_meta: n_merges = 151387 llm_load_print_meta: n_ctx_train = 8192 llm_load_print_meta: n_embd = 4096 llm_load_print_meta: n_head = 32 llm_load_print_meta: n_head_kv = 32 llm_load_print_meta: n_layer = 32 llm_load_print_meta: n_rot = 128 llm_load_print_meta: n_gqa = 1 llm_load_print_meta: f_norm_eps = 0.0e+00 llm_load_print_meta: f_norm_rms_eps = 1.0e-06 llm_load_print_meta: f_clamp_kqv = 0.0e+00 llm_load_print_meta: f_max_alibi_bias = 0.0e+00 llm_load_print_meta: n_ff = 22016 llm_load_print_meta: n_expert = 0 llm_load_print_meta: n_expert_used = 0 llm_load_print_meta: rope scaling = linear llm_load_print_meta: freq_base_train = 10000.0 llm_load_print_meta: freq_scale_train = 1 llm_load_print_meta: n_yarn_orig_ctx = 8192 llm_load_print_meta: rope_finetuned = unknown llm_load_print_meta: model type = 7B llm_load_print_meta: model ftype = Q4_0 llm_load_print_meta: model params = 7.72 B llm_load_print_meta: model size = 4.20 GiB (4.67 BPW) llm_load_print_meta: general.name = Qwen llm_load_print_meta: BOS token = 151643 '[PAD151643]' llm_load_print_meta: EOS token = 151643 '[PAD151643]' llm_load_print_meta: UNK token = 151643 '[PAD151643]' llm_load_print_meta: PAD token = 151643 '[PAD151643]' llm_load_print_meta: LF token = 148848 'ÄĬ' llm_load_tensors: ggml ctx size = 0.10 MiB llm_load_tensors: system memory used = 4297.31 MiB ................................................................................... llama_new_context_with_model: n_ctx = 2048 llama_new_context_with_model: freq_base = 10000.0 llama_new_context_with_model: freq_scale = 1 llama_new_context_with_model: KV self size = 1024.00 MiB, K (f16): 512.00 MiB, V (f16): 512.00 MiB llama_build_graph: non-view tensors processed: 740/740 llama_new_context_with_model: compute buffer total size = 307.94 MiB GGML_ASSERT: ggml.c:16656: rc == 0 /app/.devops/tools.sh: line 45: 8 Aborted (core dumped) ./server "$@"
cmd is: docker run -p 8080 :8080 -v /root/llama2023.cpp/models:/models --workdir=/app --runtime=runc ghcr.io/ggerganov/llama.cpp:full -s -m /models/qwen7b-chat-q4_0.gguf -c 2048 -n 512 --host 0.0.0.0 --parallel 10
@AppleJunJiang Please use English in title and description. That will help developer understand your issue.
This issue was closed because it has been inactive for 14 days since being marked as stale.