Stephan Walter

https://stephan.walter.name

Results 99 comments of


                                            Stephan Walter

perf: parallelize quantization

@ikawrakow did that in #896, see `kQuantizeQ4` in ggml_extra.cpp, but that's for a new quantization scheme. https://github.com/ggerganov/llama.cpp/blob/6bfb00a53b1a06e209f1b814356dd79ee96b89af/ggml_extra.cpp#L287-L291 It did indeed speed things up. This could probably be integrated into `llama_model_quantize_internal`...

perf: parallelize quantization

Resolved by #1075

Not having enough memory just causes a segfault or something

There is now an assert that checks `mem_buffer`, even in non-debug builds: https://github.com/ggerganov/llama.cpp/blob/173d0e6419e8f8f3c1f4f13201b777f4c60629f3/ggml.c#L4571 Closing this as it's quite old, please re-open if you still encounter the problem with a recent...

add 4_0 to default outfile namestr dict

It's also missing from the description for `--outtype`. According to the readme, you would use `quantize` if you wanted q4_0 or q4_1, right?

AVX2 optimization for vec_dot_q4_3_q8_0 and refactoring

Well #1083 was a bit rushed IMO, but I tried to address the loose ends. For the horizontal sum of ints, I could not see a difference in speed between...

AVX2 optimization for vec_dot_q4_3_q8_0 and refactoring

Finally I don't think there is a speed difference in the horizontal sums. I have now finished the AVX optimization for `quantize_row_q8_0`, but I'm not sure I can trust the...

Segmentation Fault Error "not enough space in the context's memory pool"

Did it work for you with commit 2a2e63c and can you narrow down the commit that broke it? In #1237, I changed some `size_t` parameters to `int`, I'm now worrying...

The prompt is not converted to tokens

No complaints after three weeks, let's assume this is fixed, possibly by #252.

Streaming conversion with no torch

The python dependencies in .devops/full.Dockerfile should also be updated, will conflict with my PR #293.

making on linuxmint 21

Presumably fixed by #563, please re-open if it's still an issue with a recent revision.

‹
1
2
3
4
5
6
7
8
9
10
›