Xuan-Son Nguyen issues

Results 49 issues of


                                            Xuan-Son Nguyen

server: Bring back multimodal support

Multimodal has been removed since https://github.com/ggerganov/llama.cpp/pull/5882 Depends on the refactoring of `llava`, we will be able to bring back the support: https://github.com/ggerganov/llama.cpp/issues/6027 This issue is created mostly for tracking purpose....

enhancement

llava

server

Feature request: Graphical GGUF viewer

# Motivation With the recent introduction of `eval-callback` example, we now having more tools for debugging when working with llama.cpp. However, one of the tool that I feel missing is...

enhancement

llama : fix `llama_chat_format_single` for mistral

Resolve #8655 Fix `llama_chat_format_single` incorrectly format system message. Also added some logs and test cases for this. The output with this PR: ``` [1721763727] formatted: [INST] You are an assistant...

testing

examples

llama : add `llama_lora_adapter_clear`

Ref discussion: https://github.com/ggerganov/llama.cpp/pull/8636#discussion_r1688268085 `llama_lora_adapter_clear()` can be used when user want to switch the adapter, but don't know which adapters are loaded into `llama_context` to be removed. For simple task switching...

examples : remove `finetune` and `train-text-from-scratch`

Ref: - https://github.com/ggerganov/llama.cpp/pull/8332#discussion_r1668929418 - https://github.com/ggerganov/llama.cpp/issues/8662#issuecomment-2247721136 These examples are no longer working and require too much efforts to maintain. Therefore, they need to be removed. It's always sad to say goodbye,...

nix

examples

python

devops

export-lora : fix issue with quantized base models

Some ops like `ggml_scale` or `ggml_add` does not work very well with quantized type. To make sure we can merge a quantized base model with lora adapter, we will dequantize...

examples

Add lightweight tests for LoRA

Ref: https://github.com/ggerganov/llama.cpp/pull/8687#issuecomment-2252155218 (cc @ggerganov) TODO: - Train some adapters based on stories15M and [stories15M_MOE](https://huggingface.co/ngxson/stories15M_MOE) - Test with `llama-cli -m base_model.gguf --lora lora_adapter.gguf` - Test merging using `llama-export-lora`, then re-run the...

enhancement

Xuan-Son Nguyen

server: Bring back multimodal support

Feature request: Graphical GGUF viewer

llama : fix `llama_chat_format_single` for mistral

llama : add `llama_lora_adapter_clear`

examples : remove `finetune` and `train-text-from-scratch`

export-lora : fix issue with quantized base models

Add lightweight tests for LoRA

gguf: sync type with upstream llama.cpp

Bug: `createCompletion` stuck when it runs out of context

ci: add e2e test