llama.cpp issues

faster performance on older machines

15

On machines with smaller memory and slower processors, it can be useful to reduce the overall number of threads running. For instance on my MacBook Pro Intel i5 16Gb machine,...

sibeliu

question

Maybe lower default temp and switch to top_k 40

4

Per [this twitter thread](https://twitter.com/theshawwn/status/1632569215348531201). See commit [here](https://github.com/shawwn/llama/commit/40d99d329a5e38d85904d3a6519c54e6dd6ee9e1).

bakkot

Fine Tuning

3

Hey! Thank you for your amazing job! I'm curious is it possible to use RLHF feedback after a response to make small incremental adjustments in a tuning process? For example,...

miolini

error: 'CLOCK_MONOTONIC' undeclared

5

The initial `make` fails with `CLOCK_MONOTONIC undeclared` ``` I llama.cpp build info: I UNAME_S: Linux I UNAME_P: unknown I UNAME_M: x86_64 I CFLAGS: -I. -O3 -DNDEBUG -std=c11 -fPIC -pthread -mavx...

lgehr

help wanted

Improving quality with 8bit?

16

I can achieve around 1 token per second on a Ryzen 7 3700X on Linux with the 65B model and 4bit quantization. If we use 8bit instead, would it run...

neuhaus

enhancement

benchmarks?

40

Where are the benchmarks for various hardware - eg. apple silicon

brappier

documentation

question

Windows 64-bit, Microsoft Visual Studio - it works like a charm after those fixes!

32

First of all thremendous work Georgi! I managed to run your project with a small adjustments on: - Intel(R) Core(TM) i7-10700T CPU @ 2.00GHz / 16GB as x64 bit app,...

bsiminski

enhancement

help wanted

good first issue

Windows fixes

This would be the initial PR to be able to compile stuff in Windows. In particular, MSVC is very picky about the features you can use and you cannot. With...

etra0

Segmentation Fault Error "not enough space in the context's memory pool"

10

This prompt with the 65B model on an M1 Max 64GB results in a segmentation fault. Works with 30B model. Are there problems with longer prompts? Related to #12 ```...

cannin

bug

need more info

Ability for `./main` to keep the model in memory and pass it more text

33

The `./main` program currently outputs text and then quits. How hard would it be to add a mode where it could stay running and be ready to accept more text...

simonw

enhancement

llama.cpp
llama.cpp copied to clipboard

Metadata

faster performance on older machines

Maybe lower default temp and switch to top_k 40

Fine Tuning

error: 'CLOCK_MONOTONIC' undeclared

Improving quality with 8bit?

benchmarks?

Windows 64-bit, Microsoft Visual Studio - it works like a charm after those fixes!

Windows fixes

Segmentation Fault Error "not enough space in the context's memory pool"

Ability for `./main` to keep the model in memory and pass it more text

← Metadata

Owner

Metadata

llama.cpp llama.cpp copied to clipboard

Metadata

← Metadata

Owner

Metadata

llama.cpp
llama.cpp copied to clipboard