Atamurad Hezretkuliyev comments

Repositories
Issues
Comments

Results 3 comments of


                                            Atamurad Hezretkuliyev

Quantization Brainstorming

@byte-6174 thanks for linking to my branch! I wanted to add some details/results so far as my branch is draft and not documented yet. ### Code structure: * **QMatrix** -...

Quantization Brainstorming

@mgrabban good catch, thank you! I was wondering why Q8_A wasn't working for weights other than WQ, WK, WV, WO and switched these weights to Q8_B. Why it worked for...

Quantization Brainstorming

I've another data point to add: I had some success running 4-bit quantized LLama2-7B-chat model with run.c. Speedup is 10x compared to FP32 weights. 4bit model file size: 4.3GB Quantization...