llama.cpp Bug: IQ3_M is significantly slower than IQ4

Bug: IQ3_M is significantly slower than IQ4_XS on AMD, is it expected?

Open Nekotekina opened this issue 1 year ago • 3 comments

What happened?

Model: https://huggingface.co/bartowski/gemma-2-27b-it-GGUF AMD GPU: RX 7600 XT + RX 7600 (full offload) With IQ3_M I get about 10 t/s when IQ4_XS is nearly 15 t/s. I thought smaller models would run faster due to lessened memory bandwidth, and they are both IQ.

Name and Version

version: 3827 (7691654c) built with Ubuntu clang version 14.0.6-1~kisak1~j for x86_64-pc-linux-gnu

What operating system are you seeing the problem on?

Linux

Relevant log output

No response

Sep 25 '24 21:09 Nekotekina

Oops, I'll retest master branch.

Sep 25 '24 21:09 ghost

Retested with latest version, the same result.

Sep 26 '24 07:09 ghost

Potentially related to issue 8760 which also mentions the difference between (IQ1, IQ2, IQ3) and (IQ4 / K)

Sep 26 '24 09:09 BrickBee

On NVidia (3090), IQ3_M is faster than IQ4_XS (~40t/s against ~35t/s)

Oct 19 '24 15:10 ghost

But, On 1x NVIDIA 3090 (DDR4-offload), IQ3_S and IQ3_M are slower than IQ4_XS (about 0.5x speed) I seem that Only NVIDIA can deal IQ3 with highspeed.

Nov 13 '24 03:11 grapevine-AI

This issue was closed because it has been inactive for 14 days since being marked as stale.

Dec 29 '24 01:12 github-actions[bot]

llama.cpp llama.cpp copied to clipboard

Bug: IQ3_M is significantly slower than IQ4_XS on AMD, is it expected?

What happened?

Name and Version

What operating system are you seeing the problem on?

Relevant log output

llama.cpp
llama.cpp copied to clipboard