llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

Q4_K implementation for Metal

Open ikawrakow opened this issue 2 years ago • 0 comments

Implemented mostly following the Q4_0 Metal implementation.

Slightly slower than Q4_0: on my 30-core M2 Max GPU and 256 tokens it takes 28.1 ms/token compared to 27.0 ms/token for Q4_0.

ikawrakow avatar Jun 07 '23 07:06 ikawrakow