Xuan-Son Nguyen

Results 49 issues of Xuan-Son Nguyen

Control vectors technique is a light-weight way to control the behavior of the model. Think of it as a very light-weight fine tuning. Ref: https://github.com/ggerganov/llama.cpp/pull/5970

enhancement

We're having not much details for now. This issue is currently for keeping track or upstream issue: https://github.com/ggerganov/llama.cpp/issues/7773

llama.cpp related

# What does this PR do? Add `load_tests` docker script, ready to test with inference endpoint ## Who can review? To be discussed with @Vaibhavs10

(Hopefully) fix #8010 > [!IMPORTANT] > This is still WIP, only `simple` example is working > Collaborators are encouraged to discuss and give feedback on this ## Motivation Currently, the...

examples
python

Cont https://github.com/ggerganov/llama.cpp/pull/9638#discussion_r1775324045 --- - [x] I have read the [contributing guidelines](https://github.com/ggerganov/llama.cpp/blob/master/CONTRIBUTING.md) - Self-reported review complexity: - [x] Low

nix
devops

### What happened? When using server completions (or chat completions) **without** stream, it is impossible to cancel the request midway. ## To reproduce the problem 1. Compile and run the...

bug-unconfirmed
stale
server
low severity

Initial reports can be seen from https://github.com/ggerganov/llama.cpp/pull/8227 > [!IMPORTANT] > A note for everyone: if you think there's a bug in llama.cpp tokenizer, please make sure to test with HF...

enhancement
stale

### Prerequisites - [X] I am running the latest code. Mention the version if possible as well. - [X] I carefully followed the [README.md](https://github.com/ggerganov/llama.cpp/blob/master/README.md). - [X] I searched using keywords...

enhancement
help wanted

### Feature Description - Ability to run `llama-cli` in chat (conversation) mode automatically if there is a built-in chat template - Having commands like `/regen`, `/readfile`, etc (demo in https://github.com/ggerganov/llama.cpp/pull/10145)...

enhancement
stale

### Name and Version While testing the rerank model on HF inference endpoint, we got this error: `GGML_ASSERT(strcmp(res->name, "result_output") == 0 && "missing result_output tensor") failed` This is due to...

server