llama.cpp issues

[CANN] Update CANN model support status

as the title

documentation

Ascend NPU

GGML_ASSERT(cur_p->size > 0) failed, or gibberish on DeepSeek V3 0324 (Q2_K_XL), CUDA + CPU

2

Hi there! I found that I got this issue when trying to use some higher values of -b and -ub with DeepSeekV3, as doing so it increases the PP performance...

Panchovix

Eval bug: Crash in trim method

1

### Name and Version Custom build llama.cpp library from b5022 (older versions crash as well) ### Operating systems Windows ### GGML backends CUDA ### Hardware RTX 3080Ti, i7-12700F ### Models...

MartinPerry

bug-unconfirmed

stale

Compile bug: cuda backend compile error

### Git commit 1682e39aa5bb1699fae3f760450be2e76d35a6a1 ### Operating systems Linux ### GGML backends CUDA ### Problem description & steps to reproduce Tell CMake where to find the compiler by setting either the...

lizhenneng

bug-unconfirmed

stale

multiple_choice_score : task 17 does not fit in the context window

1

I run into this issue on nearly almost every `-bf` file when using `llama-perplexity with --multiple-choice` Any idea on what happened or what should I do to fix this ?...

Bobchenyx

stale

How to use chat_template with .gguf models ? (tokenizer_name not implemented)

Hi, I'm currently facing this `tokenizer_name NotImplementedError` while testing quantized `.gguf`model with `[lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness)` I'm having this trouble with --apply_chat_template run command lm_eval --model gguf --model_args base_url=http://127.0.1.1:8080 --tasks gsm8k --output_path result/gsm8k...

Bobchenyx

stale

Misc. bug: crashes when calling `llama_state_get_size` on a reranking model

### Name and Version Compiled at commit https://github.com/ggml-org/llama.cpp/commit/6562e5a4d6c58326dcd79002ea396d4141f1b18e, but it also happens on the latest master version. ### Operating systems Mac ### Which llama.cpp modules do you know to be...

giladgd

bug-unconfirmed

llama: Add configuration presets for chat and reranking servers

Added two new configuration presets to simplify command-line usage: 1. --chat-llama3-8b-default for running a chat server with Llama3 8B model, 2. --rerank-bge-default for running a reranking server with the BGE...

heyyymonth

mtmd : remove libllava, remove clip-quantize-cli (⚠️ breaking change)

1

In this PR: - Remove `libllava` - it contains too many redundant and unsafe code - the `libmtmd` already covers all use cases with a better API - Remove `clip-quantize-cli`...

ngxson

breaking change

examples

python

ggml : add mrope kernel for metal

ngxson

ggml

Apple Metal

llama.cpp
llama.cpp copied to clipboard

Metadata

[CANN] Update CANN model support status

GGML_ASSERT(cur_p->size > 0) failed, or gibberish on DeepSeek V3 0324 (Q2_K_XL), CUDA + CPU

Eval bug: Crash in trim method

Compile bug: cuda backend compile error

multiple_choice_score : task 17 does not fit in the context window

How to use chat_template with .gguf models ? (tokenizer_name not implemented)

Misc. bug: crashes when calling `llama_state_get_size` on a reranking model

llama: Add configuration presets for chat and reranking servers

mtmd : remove libllava, remove clip-quantize-cli (⚠️ breaking change)

ggml : add mrope kernel for metal

← Metadata

Owner

Metadata

llama.cpp llama.cpp copied to clipboard

Metadata

← Metadata

Owner

Metadata

llama.cpp
llama.cpp copied to clipboard