Xuan-Son Nguyen
Xuan-Son Nguyen
Multimodal has been removed since https://github.com/ggerganov/llama.cpp/pull/5882 Depends on the refactoring of `llava`, we will be able to bring back the support: https://github.com/ggerganov/llama.cpp/issues/6027 This issue is created mostly for tracking purpose....
# Motivation With the recent introduction of `eval-callback` example, we now having more tools for debugging when working with llama.cpp. However, one of the tool that I feel missing is...
Resolve #8655 Fix `llama_chat_format_single` incorrectly format system message. Also added some logs and test cases for this. The output with this PR: ``` [1721763727] formatted: [INST] You are an assistant...
Ref discussion: https://github.com/ggerganov/llama.cpp/pull/8636#discussion_r1688268085 `llama_lora_adapter_clear()` can be used when user want to switch the adapter, but don't know which adapters are loaded into `llama_context` to be removed. For simple task switching...
Ref: - https://github.com/ggerganov/llama.cpp/pull/8332#discussion_r1668929418 - https://github.com/ggerganov/llama.cpp/issues/8662#issuecomment-2247721136 These examples are no longer working and require too much efforts to maintain. Therefore, they need to be removed. It's always sad to say goodbye,...
Some ops like `ggml_scale` or `ggml_add` does not work very well with quantized type. To make sure we can merge a quantized base model with lora adapter, we will dequantize...
Ref: https://github.com/ggerganov/llama.cpp/pull/8687#issuecomment-2252155218 (cc @ggerganov) TODO: - Train some adapters based on stories15M and [stories15M_MOE](https://huggingface.co/ngxson/stories15M_MOE) - Test with `llama-cli -m base_model.gguf --lora lora_adapter.gguf` - Test merging using `llama-export-lora`, then re-run the...
Generated by `scripts/generate-llm.ts`
Bug can be reproduced when calling `createCompletion('a very long input text')`, the function will never return.
add e2e browser test in CI