llama.cpp issues

Misc. bug: `json_schema` under `response_format` is not working on OpenAI compatible API endpoint `v1/chat/completions`

1

### Name and Version build: 4761 (cad53fc9) with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu ### Operating systems Linux ### Which llama.cpp modules do you know to be affected? llama-server ###...

henryclw

bug-unconfirmed

Add support for missing quants in CPY (Metal & CUDA).

Fixes #10976.

gcp

Nvidia GPU

ggml

Apple Metal

CUDA: correct the lowest Maxwell supported by CUDA 12

The the lowest architecture supported by CUDA 12 is Maxwell. And 5.0 is the lowest one in the Maxwell family.

PureJourney

Nvidia GPU

ggml

Fix visual encoders with no CLS

This PR fixes the bug outlined in this issue: https://github.com/ggml-org/llama.cpp/issues/10157 As well as discussed in projects leverage llama cpp like ollama: https://github.com/ollama/ollama/issues/7441 https://github.com/ollama/ollama-python/issues/433 ### Summary In `clip.cpp`, we initialize a...

alex-jw-brooks

examples

Feature Request: add Kernel level verbose option

### Prerequisites - [x] I am running the latest code. Mention the version if possible as well. - [x] I carefully followed the [README.md](https://github.com/ggml-org/llama.cpp/blob/master/README.md). - [x] I searched using keywords...

0400H

enhancement

Misc. bug: llama-cli '--log-disable' parameter omits response

1

### Name and Version version: 4526 (a94f3b27) built with cc (Ubuntu 12.3.0-1ubuntu1~22.04) 12.3.0 for x86_64-linux-gnu Not sure from when this started, but before, when using llama-cli with --log-disable, I would...

nmandic78

bug-unconfirmed

Eval bug: CANNOT LINK EXECUTABLE "./llama-cli": library "libomp.so" not found: needed by main executable

5

### Name and Version ./llama-cli --version CANNOT LINK EXECUTABLE "./llama-cli": library "libomp.so" not found: needed by main executable ### Operating systems Other? (Please let us know in description) ### GGML...

Krallbe68

bug-unconfirmed

ggml : add ANE backend

2

According to this https://github.com/ggerganov/llama.cpp/discussions/336#discussioncomment-11184134, there is a new CoreML API and an ANE backend might be possible to implement with latest Apple software/hardware.

ggerganov

help wanted

research 🔬

roadmap

[Feature]: SOC_VERSION ascend310b1 does not support

``` root@orangepiaipro-20t:/data/llama.cpp# cmake -B build -DGGML_CANN=on -DCMAKE_BUILD_TYPE=release -- Warning: ccache not found - consider installing it for faster compilation or disable this warning with GGML_CCACHE=OFF -- CMAKE_SYSTEM_PROCESSOR: aarch64 -- Including...

Cikaros

GGML to GGUF FAIL Quantized tensor bytes per row (5120) is not a multiple of Q2_K type size (84)

im try to convert this ggml to gguf but i got this error .thank you python convert_llama_ggml_to_gguf.py --input "D:\nectec\model\llama-2-13b-chat.ggmlv3.q2_K.bin" --output "D:\nectec\model\llama-2-13b-chat.gguf" INFO:ggml-to-gguf:* Using config: Namespace(input=WindowsPath('D:/nectec/model/llama-2-13b-chat.ggmlv3.q2_K.bin'), output=WindowsPath('D:/nectec/model/llama-2-13b-chat.gguf'), name=None, desc=None, gqa=8, eps='0',...

chokoon123

llama.cpp
llama.cpp copied to clipboard

Metadata

Misc. bug: `json_schema` under `response_format` is not working on OpenAI compatible API endpoint `v1/chat/completions`

Add support for missing quants in CPY (Metal & CUDA).

CUDA: correct the lowest Maxwell supported by CUDA 12

Fix visual encoders with no CLS

Feature Request: add Kernel level verbose option

Misc. bug: llama-cli '--log-disable' parameter omits response

Eval bug: CANNOT LINK EXECUTABLE "./llama-cli": library "libomp.so" not found: needed by main executable

ggml : add ANE backend

[Feature]: SOC_VERSION ascend310b1 does not support

GGML to GGUF FAIL Quantized tensor bytes per row (5120) is not a multiple of Q2_K type size (84)

← Metadata

Owner

Metadata

llama.cpp llama.cpp copied to clipboard

Metadata

← Metadata

Owner

Metadata

llama.cpp
llama.cpp copied to clipboard