Stephan Walter comments

Results 99 comments of


                                            Stephan Walter

Introduce structs for the q4 data blocks

I consider it ready now but I'm open to revising it or waiting for other changes. I agree that it should be tested on ARM and AVX512.

Investigate alternative approach for Q4 quantization

[Low-bit Quantization of Neural Networks for Efficient Inference](https://arxiv.org/abs/1902.06822) deals with 4-bit quantization specifically. As a smaller step, I can think of these optimizations: * use F16 for the scaling factor....

Update tools.sh to use consolidated

The readme still says: > The easiest way to download the models, convert them to ggml and optimize them is with the --all-in-one command which includes the full docker image....

Mismatch in Vocabulary Size: Investigating Inconsistencies between Token-to-ID and ID-to-Token Dictionaries

Probably an old hard-coded value: #142?

[ERROR] Using "make" command

As the Makefile no longer sets specific instruction set options, but uses `-march=native -mtune=native`, this should no longer occur. Please reopen if you still have the problem with the latest...

2-bit integer quantization

Go home Q2, you're drunk ;-) ``` $ ./main -m ./models/7B/ggml-model-q2_0.bin -p "The efforts needed to add this support are so small that there is no reason not to do...

2-bit integer quantization

Updated my branch with AVX optimizations, probably far from perfect. Still quite slow... Q2: ``` 98.37 seconds per pass - ETA 17.90 hours [1]147.6625,[2]136.8862,[3]132.6015,[4]127.8629,[5]120.4091,[6]111.7640,[7]114.2548,[8]112.8951, ``` Q3: ``` 203.61 seconds per...

Stephan Walter

Introduce structs for the q4 data blocks

Investigate alternative approach for Q4 quantization

Update tools.sh to use consolidated

Mismatch in Vocabulary Size: Investigating Inconsistencies between Token-to-ID and ID-to-Token Dictionaries

[ERROR] Using "make" command

2-bit integer quantization

2-bit integer quantization

Simplify the quantization process

Be more strict about converting float to double

Be more strict about converting float to double