llama.cpp issues

[fixed]The last code build with memory fix running result is not good in my pc.

10

Be obviously slower with Q_1 30b model. And the memory usage become garbage... (Linux 5.19 x64 Ubuntu base)

FNsi

bug

performance

Add SIMD implementation of ggml_compute_forward_rms_norm_f32

1

Using the GGML SIMD macros so hopefully it should work on different architectures, but only tested with AVX 2. Don't expect any meaningful performance improvement, the function is not very...

slaren

Be more strict about converting float to double

7

This enables `-Wdouble-promotion` and syncs the `Makefile` and `CMakeLists.txt` with regards to warnings. Reasoning: The llama.cpp codebase depends on the correct use of number types, whether those are `float`, `double`...

sw

Can it support avx cpu's older than 10 years old?

10

I can't run any model due to my cpu is from before 2013.So I don't have avx2 instructions.Can you please support avx cpus?

FlowDownTheRiver

enhancement

hardware

build

Add support for running bloom models

5

Bloom models have a more permissive license than llama models and are also multilingual in nature. While there is a project [based on llama.cpp](https://github.com/NouamaneTazi/bloomz.cpp) which can perform inference of bloom...

bil-ash

enhancement

model

Change ./main help output to better reflect context size's affect on generation length

2

### Discussed in https://github.com/ggerganov/llama.cpp/discussions/446 Originally posted by **cmp-nct** March 24, 2023 I've been testing alpaca 30B (-t 24 -n 2000 --temp 0.2 -b 32 --n_parts 1 --ignore-eos --instruct) I've consistently...

cmp-nct

documentation

enhancement

possible to run full sizes?

1

I'm not sure if this is an enhancement request because maybe it's already supported. Is it possible to run the full models? I know they take a ton of extra...

draconicfae

Cannot run llama.cpp on termux. Bash permission denied

2

When trying to run './bin/main/ -m ./models/7B/ggml-model-q4_0.bin -n 128' termux throws this output: bash: ./bin/main: permission denied

ManuXD32

Eliminate `ggml_forward_mul_mat_xxx()` branch for non-contiguous `src0`

See explanation here: https://github.com/ggerganov/llama.cpp/pull/439

ggerganov

enhancement

Is it possible to run 65B with 32Gb of Ram ?

I already quantized my files with this command ./quantize ./ggml-model-f16.bin.X E:\GPThome\LLaMA\llama.cpp-master-31572d9\models\65B\ggml-model-q4_0.bin.X 2 , the first time it reduced my files size from 15.9 to 4.9Gb and when i tried to...

TerraTR

llama.cpp
llama.cpp copied to clipboard

Metadata

[fixed]The last code build with memory fix running result is not good in my pc.

Add SIMD implementation of ggml_compute_forward_rms_norm_f32

Be more strict about converting float to double

Can it support avx cpu's older than 10 years old?

Add support for running bloom models

Change ./main help output to better reflect context size's affect on generation length

possible to run full sizes?

Cannot run llama.cpp on termux. Bash permission denied

Eliminate `ggml_forward_mul_mat_xxx()` branch for non-contiguous `src0`

Is it possible to run 65B with 32Gb of Ram ?

← Metadata

Owner

Metadata

llama.cpp llama.cpp copied to clipboard

Metadata

← Metadata

Owner

Metadata

llama.cpp
llama.cpp copied to clipboard