llama.cpp
llama.cpp copied to clipboard
LLM inference in C/C++
### Name and Version root@f7545b6b4f65:/app# ./llama-cli --version load_backend: loaded CPU backend from ./libggml-cpu-alderlake.so version: 4460 (ba8a1f9c) built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu ### Operating systems Linux, Other? (Please...
### Git commit git rev-parse HEAD c5ede3849fc021174862f9c0bf8273808d8f0d39 ### Operating systems Linux ### GGML backends CUDA ### Problem description & steps to reproduce I want to build llama.cpp from source in...
* Add a new option `GGML_HIP_ROCWMMA_FATTN` and defaults to OFF * Check for rocWMMA header availability when `GGML_HIP_ROCWMMA_FATTN` is enabled * Define `FP16_MMA_AVAILABLE` when `GGML_HIP_ROCWMMA_FATTN` is enabled and target is...
### Name and Version ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 3 CUDA devices: Device 0: Tesla P40, compute capability 6.1, VMM: yes Device 1: Tesla P40, compute capability...
Add to webui a quick access for custom configurations (including prompt).  As asked by @xydac in an old PR, the chat prompts from https://github.com/f/awesome-chatgpt-prompts has been loaded in...
cuda 12.8 added the [option](https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#compress-mode-default-size-speed-balance-none-compress-mode) to specify stronger compression for binaries. I ran some tests in CI with the [new ubuntu 12.8 docker image](https://hub.docker.com/r/nvidia/cuda/): ## `89-real` arch In this scenario,...
### Name and Version version: 4747 (c5d91a74) built with cc (Debian 11.3.0-12) 11.3.0 for x86_64-linux-gnu ### Problem description & steps to reproduce Webui unusably slow over network due to forced...