Xuan-Son Nguyen issues

Results 49 issues of


                                            Xuan-Son Nguyen

Add support for control vectors

Control vectors technique is a light-weight way to control the behavior of the model. Think of it as a very light-weight fine tuning. Ref: https://github.com/ggerganov/llama.cpp/pull/5970

enhancement

Add WebGPU support

We're having not much details for now. This issue is currently for keeping track or upstream issue: https://github.com/ggerganov/llama.cpp/issues/7773

llama.cpp related

add docker load_tests

# What does this PR do? Add `load_tests` docker script, ready to test with inference endpoint ## Who can review? To be discussed with @Vaibhavs10

llama : first attempt to implement vision API (WIP)

(Hopefully) fix #8010 > [!IMPORTANT] > This is still WIP, only `simple` example is working > Collaborators are encouraged to discuss and give feedback on this ## Motivation Currently, the...

examples

python

Cont https://github.com/ggerganov/llama.cpp/pull/9638#discussion_r1775324045 --- - [x] I have read the [contributing guidelines](https://github.com/ggerganov/llama.cpp/blob/master/CONTRIBUTING.md) - Self-reported review complexity: - [x] Low

nix

devops

Bug: (Server) Cannot properly cancel a non-stream completion request

### What happened? When using server completions (or chat completions) **without** stream, it is impossible to cancel the request midway. ## To reproduce the problem 1. Compile and run the...

bug-unconfirmed

stale

server

low severity

Investigate gemma 2 generation quality

Initial reports can be seen from https://github.com/ggerganov/llama.cpp/pull/8227 > [!IMPORTANT] > A note for everyone: if you think there's a bug in llama.cpp tokenizer, please make sure to test with HF...

enhancement

stale

Feature Request: Installable package via winget

### Prerequisites - [X] I am running the latest code. Mention the version if possible as well. - [X] I carefully followed the [README.md](https://github.com/ggerganov/llama.cpp/blob/master/README.md). - [X] I searched using keywords...

enhancement

help wanted

Feature Request: Better chat UX for llama-cli

### Feature Description - Ability to run `llama-cli` in chat (conversation) mode automatically if there is a built-in chat template - Having commands like `/regen`, `/readfile`, etc (demo in https://github.com/ggerganov/llama.cpp/pull/10145)...

enhancement

stale

Misc. bug: server not exit after `missing result_output tensor` error

### Name and Version While testing the rerank model on HF inference endpoint, we got this error: `GGML_ASSERT(strcmp(res->name, "result_output") == 0 && "missing result_output tensor") failed` This is due to...

server