llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

LLM inference in C/C++

Results 1628 llama.cpp issues
Sort by recently updated
recently updated
newest added

On machines with smaller memory and slower processors, it can be useful to reduce the overall number of threads running. For instance on my MacBook Pro Intel i5 16Gb machine,...

question

Per [this twitter thread](https://twitter.com/theshawwn/status/1632569215348531201). See commit [here](https://github.com/shawwn/llama/commit/40d99d329a5e38d85904d3a6519c54e6dd6ee9e1).

Hey! Thank you for your amazing job! I'm curious is it possible to use RLHF feedback after a response to make small incremental adjustments in a tuning process? For example,...

The initial `make` fails with `CLOCK_MONOTONIC undeclared` ``` I llama.cpp build info: I UNAME_S: Linux I UNAME_P: unknown I UNAME_M: x86_64 I CFLAGS: -I. -O3 -DNDEBUG -std=c11 -fPIC -pthread -mavx...

help wanted

I can achieve around 1 token per second on a Ryzen 7 3700X on Linux with the 65B model and 4bit quantization. If we use 8bit instead, would it run...

enhancement

Where are the benchmarks for various hardware - eg. apple silicon

documentation
question

First of all thremendous work Georgi! I managed to run your project with a small adjustments on: - Intel(R) Core(TM) i7-10700T CPU @ 2.00GHz / 16GB as x64 bit app,...

enhancement
help wanted
good first issue

This would be the initial PR to be able to compile stuff in Windows. In particular, MSVC is very picky about the features you can use and you cannot. With...

This prompt with the 65B model on an M1 Max 64GB results in a segmentation fault. Works with 30B model. Are there problems with longer prompts? Related to #12 ```...

bug
need more info

The `./main` program currently outputs text and then quits. How hard would it be to add a mode where it could stay running and be ready to accept more text...

enhancement