llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

Eval bug: Q4_K_M with vulkan generates garbage/repetitive output

Open eiffel31 opened this issue 1 month ago • 8 comments

Name and Version

version: 6933 (fcfce040e) built with cc (Ubuntu 11.4.0-1ubuntu1~22.04.2) 11.4.0 for x86_64-linux-gnu

Operating systems

Linux

GGML backends

Vulkan

Hardware

i5-1135G7

Models

ggml-org/gemma-3-1b-it-Q4_K_M.gguf

Problem description & steps to reproduce

When I run llama-cli, any prompt gives inappropriate output. Often a short sequence is repeated endlessly. Generated text is unrelated to the prompt.

Note:

  • the same model works fine on CPU => vulkan backend problem
  • using Q8_0 or f16 models works fine with vulkan backend => Q4 specific problem
  • I have the feeling that it depends on previous vulkan run. It may be an init problem.

First Bad Commit

No response

Relevant log output

Examples of truncated generation:
> go
부드럽게 바꾸기 100%  화이트보드
(필요한 경우)
1.  이름
2.  연락처
3.  이메일
4.  프로필 사진

> go
 hubulong 2023 + 2024 to a complex system. This would involve many layers and complex interactions. The project would aim to create a truly dynamic system.  The complexity is a core element of the project, ensuring the system isn't static.


> go
 April 1st 2024 2024 to 2025 [2025] to 2026 - 2027 - 2028  2029 - 2030  2031 - 2032 to 2033 - 2034 - 2035 - 2036 - 2037 - 2038 - 2039 - 2040 - 2041 - 2042 - 2043 - 2044 - 2045 - 2046 - 2047 - 2048 - 2049 - 2050 - 2051 - 2052 - 2053 - 2054 - 2055 - 2056 - 2057 - 2058 - 2059 - 2060 - 2061 - 2062 - 2063 - 2064 - 2065 - 2066 - 2067 - 2068 - 2069


> go
부활 13th 5th 4th 3rd 2nd 1st 0th 9th 8th 7th 6th 5th 4th 3rd 2nd 1st 0th 9th 8th 7th 6th 5th 4th 3rd 2nd 1st 0th 9th 8th 7th 6th 5th 4th 3rd 2nd 1st 0th 9th 8th 7th

> go
부드러운 2780809699
and 36300356589
and 29314271987
and 54752771472
and 38856176395
and 23827691788

> go
’er-ish’s’ to ‘er’s’ + ‘er’s’ and ‘er’s’ to ‘er’s’ + ‘er’s’ and so on to 9 + 1 to 9 + 1.
This is a complex expression and it's not a standard programming concept. It's a method of generating a long string of characters.
Let's try a simple example:
Input: "hello"
Output: "helloworld"
Input: "world"
Output: "helloworld"
Input: "hello"
Output: "helloworld"

> go
부드러운 1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1

> go
부침과 같은 것과 같은 것과 같은 것
와 같은 것.
**설명:**
이것은 매우 복잡하고 난해한 질문이며, 그에 대한 답을 얻는 것은 매우 어려운 일입니다. 특히 복잡한 수학적 개념과 개념에 대한 이해를 필요로 합니다.
**이 질문의 핵심은 다음과 같습니다.**

> go
ensical 92/112/112-2023-9.
9.26. 3.23. 14.21. 13.53. 17.17. 16.20. 13.93. 16.61. 15.11. 17.30. 14.73. 14.13. 14.12. 14.13. 14.13. 14.13. 14.13. 14.13. 14.13. 14.13. 14.13. 14.13. 14.13.

> go
부드럽게 95% + 95% + 95% + 95% + 95% + 95% + 95% + 95% + 95% + 95% + 95% + 95% + 95% + 95% + 95% + 95% + 95% + 95% + 95% + 95% + 95% + 95% + 95% + 95% + 95% + 95% + 95% + 95% + 95% + 95% + 95% + 95% + 95% + 95% + 95% + 95% + 95% + 95% + 95% + 95% + 95% + 95% + 95% + 95% + 95% + 95% + 95% + 95% + 95%

eiffel31 avatar Nov 03 '25 09:11 eiffel31

Which driver version do you have installed? With Ubuntu 22.04, it could be quite outdated. I think upgrading it (for example with a PPA) and seeing if the issue persists would be a good first step to figure this out.

0cc4m avatar Nov 03 '25 16:11 0cc4m

Vulkan Instance Version: 1.3.239 Not sure this was the version you were looking for. The system is Debian 12.

eiffel31 avatar Nov 03 '25 17:11 eiffel31

No, the actual driver version is most relevant. You can get it by running vulkaninfo --summary and posting the output here.

0cc4m avatar Nov 04 '25 08:11 0cc4m

========== VULKANINFO

Vulkan Instance Version: 1.3.239

Instance Extensions: count = 20

VK_EXT_acquire_drm_display : extension revision 1 VK_EXT_acquire_xlib_display : extension revision 1 VK_EXT_debug_report : extension revision 10 VK_EXT_debug_utils : extension revision 2 VK_EXT_direct_mode_display : extension revision 1 VK_EXT_display_surface_counter : extension revision 1 VK_KHR_device_group_creation : extension revision 1 VK_KHR_display : extension revision 23 VK_KHR_external_fence_capabilities : extension revision 1 VK_KHR_external_memory_capabilities : extension revision 1 VK_KHR_external_semaphore_capabilities : extension revision 1 VK_KHR_get_display_properties2 : extension revision 1 VK_KHR_get_physical_device_properties2 : extension revision 2 VK_KHR_get_surface_capabilities2 : extension revision 1 VK_KHR_portability_enumeration : extension revision 1 VK_KHR_surface : extension revision 25 VK_KHR_surface_protected_capabilities : extension revision 1 VK_KHR_wayland_surface : extension revision 6 VK_KHR_xcb_surface : extension revision 6 VK_KHR_xlib_surface : extension revision 6

Instance Layers: count = 3

VK_LAYER_INTEL_nullhw INTEL NULL HW 1.1.73 version 1 VK_LAYER_MESA_device_select Linux device selection layer 1.3.211 version 1 VK_LAYER_MESA_overlay Mesa Overlay layer 1.3.211 version 1

Devices:

GPU0: apiVersion = 1.3.230 driverVersion = 22.3.6 vendorID = 0x8086 deviceID = 0x9a49 deviceType = PHYSICAL_DEVICE_TYPE_INTEGRATED_GPU deviceName = Intel(R) Xe Graphics (TGL GT2) driverID = DRIVER_ID_INTEL_OPEN_SOURCE_MESA driverName = Intel open-source Mesa driver driverInfo = Mesa 22.3.6 conformanceVersion = 1.3.0.0 deviceUUID = ff258cf4-4865-a82b-e58f-77bffa8e3040 driverUUID = da807cc5-e5c9-2add-5541-8357feabd0cc GPU1: apiVersion = 1.3.230 driverVersion = 0.0.1 vendorID = 0x10005 deviceID = 0x0000 deviceType = PHYSICAL_DEVICE_TYPE_CPU deviceName = llvmpipe (LLVM 15.0.6, 256 bits) driverID = DRIVER_ID_MESA_LLVMPIPE driverName = llvmpipe driverInfo = Mesa 22.3.6 (LLVM 15.0.6) conformanceVersion = 1.3.1.1 deviceUUID = 6d657361-3232-2e33-2e36-000000000000 driverUUID = 6c6c766d-7069-7065-5555-494400000000

eiffel31 avatar Nov 05 '25 04:11 eiffel31

That is in fact quite old. Please try upgrading the Mesa driver, for example with https://launchpad.net/~kisak/+archive/ubuntu/turtle

0cc4m avatar Nov 05 '25 08:11 0cc4m

I just made a test with an updated version. The result is unchanged.

Details about the new version:

========== VULKANINFO

Vulkan Instance Version: 1.4.309

Instance Extensions: count = 24

VK_EXT_acquire_drm_display : extension revision 1 VK_EXT_acquire_xlib_display : extension revision 1 VK_EXT_debug_report : extension revision 10 VK_EXT_debug_utils : extension revision 2 VK_EXT_direct_mode_display : extension revision 1 VK_EXT_display_surface_counter : extension revision 1 VK_EXT_headless_surface : extension revision 1 VK_EXT_surface_maintenance1 : extension revision 1 VK_EXT_swapchain_colorspace : extension revision 5 VK_KHR_device_group_creation : extension revision 1 VK_KHR_display : extension revision 23 VK_KHR_external_fence_capabilities : extension revision 1 VK_KHR_external_memory_capabilities : extension revision 1 VK_KHR_external_semaphore_capabilities : extension revision 1 VK_KHR_get_display_properties2 : extension revision 1 VK_KHR_get_physical_device_properties2 : extension revision 2 VK_KHR_get_surface_capabilities2 : extension revision 1 VK_KHR_portability_enumeration : extension revision 1 VK_KHR_surface : extension revision 25 VK_KHR_surface_protected_capabilities : extension revision 1 VK_KHR_wayland_surface : extension revision 6 VK_KHR_xcb_surface : extension revision 6 VK_KHR_xlib_surface : extension revision 6 VK_LUNARG_direct_driver_loading : extension revision 1

Instance Layers: count = 3

VK_LAYER_INTEL_nullhw INTEL NULL HW 1.1.73 version 1 VK_LAYER_MESA_device_select Linux device selection layer 1.4.303 version 1 VK_LAYER_MESA_overlay Mesa Overlay layer 1.4.303 version 1

Devices:

GPU0: apiVersion = 1.4.305 driverVersion = 25.0.7 vendorID = 0x8086 deviceID = 0x9a49 deviceType = PHYSICAL_DEVICE_TYPE_INTEGRATED_GPU deviceName = Intel(R) Iris(R) Xe Graphics (TGL GT2) driverID = DRIVER_ID_INTEL_OPEN_SOURCE_MESA driverName = Intel open-source Mesa driver driverInfo = Mesa 25.0.7-2 conformanceVersion = 1.4.0.0 deviceUUID = 8680499a-0100-0000-0002-000000000000 driverUUID = 380ec810-5d64-cbbd-0034-d8c5a796bd6c GPU1: apiVersion = 1.4.305 driverVersion = 0.0.1 vendorID = 0x10005 deviceID = 0x0000 deviceType = PHYSICAL_DEVICE_TYPE_CPU deviceName = llvmpipe (LLVM 19.1.7, 256 bits) driverID = DRIVER_ID_MESA_LLVMPIPE driverName = llvmpipe driverInfo = Mesa 25.0.7-2 (LLVM 19.1.7) conformanceVersion = 1.3.1.1 deviceUUID = 6d657361-3235-2e30-2e37-2d3200000000 driverUUID = 6c6c766d-7069-7065-5555-494400000000

eiffel31 avatar Nov 07 '25 18:11 eiffel31

Thank you for testing that. Can you check if #17108 resolves it? If not, it's most likely the same Intel problem as discussed in #17106.

0cc4m avatar Nov 09 '25 08:11 0cc4m

Setting GGML_VK_DISABLE_F16 and the problem is gone. version 7003 does not solve the problem.

eiffel31 avatar Nov 09 '25 18:11 eiffel31

can reproduce, though instead of garbage I just get spammed unused tokens (if fully on GPU, then unused24, otherwise random) problem goes away at the expense of performance with -ngl 0. Also the magic env-var provided by eiffel31 works perfectly.

Hex4dec avatar Nov 12 '25 17:11 Hex4dec

I just tested version 7108, the wrong behavior is still there.

eiffel31 avatar Nov 20 '25 09:11 eiffel31

Yeah, we didn't do anything about that yet. I can take another look, but I didn't find consistent behaviour, last time. It may be a driver bug.

0cc4m avatar Nov 20 '25 09:11 0cc4m