llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

Excessively slow prompt processing time with 70B partially offloaded in SYCL

Open Jacoby1218 opened this issue 1 year ago • 5 comments

prompt processing is extremely slow with a 70B partially offloaded. llama-bench.exe -ngl 20 -m "D:\models\lzlv_70b_fp16_hf.Q4_K_M.gguf" Using device 0 (Intel(R) Arc(TM) A770 Graphics) as main device

model size params backend ngl test t/s
llama 70B Q4_K - Medium 38.58 GiB 68.98 B SYCL 20 pp 512 2.14 ± 0.28
llama 70B Q4_K - Medium 38.58 GiB 68.98 B SYCL 20 tg 128 1.03 ± 0.01

build: a28c5eff (2045)

Jacoby1218 avatar Feb 02 '24 04:02 Jacoby1218

hi @Jacoby1218 could you provide some reference data to show the magnitude of gaps? for example, performance on RTX-4070ti (16 GB), or entirely on iGPU/CPU?

airMeng avatar Feb 02 '24 06:02 airMeng

I don't have any other GPU to test, but i can provide results from my CPU and other backends.

model size params backend threads test t/s
llama 70B Q4_K - Medium 38.58 GiB 68.98 B BLAS 6 pp 512 1.93 ± 0.06
llama 70B Q4_K - Medium 38.58 GiB 68.98 B BLAS 6 tg 128 0.81 ± 0.02
model size params backend ngl test t/s
llama 70B Q4_K - Medium 38.58 GiB 68.98 B Vulkan 20 pp 512 7.02 ± 0.25
llama 70B Q4_K - Medium 38.58 GiB 68.98 B Vulkan 20 tg 128 0.97 ± 0.04
llama 70B Q4_K - Medium 38.58 GiB 68.98 B OpenCL 20 pp 512 8.81 ± 1.10
llama 70B Q4_K - Medium 38.58 GiB 68.98 B OpenCL 20 tg 128 0.82 ± 0.02

Jacoby1218 avatar Feb 02 '24 07:02 Jacoby1218

I think this maybe due to lacking optimization on multi-batch, has been recordd in https://github.com/ggerganov/llama.cpp/discussions/5277, please stay tuned!

airMeng avatar Feb 02 '24 08:02 airMeng

This issue is stale because it has been open for 30 days with no activity.

github-actions[bot] avatar Mar 18 '24 01:03 github-actions[bot]

I think this has been improved with https://github.com/ggerganov/llama.cpp/pull/6217, please give a try.

airMeng avatar Mar 24 '24 13:03 airMeng

This issue was closed because it has been inactive for 14 days since being marked as stale.

github-actions[bot] avatar May 09 '24 01:05 github-actions[bot]