gpt4all icon indicating copy to clipboard operation
gpt4all copied to clipboard

Vulkan: Meta-Llama-3.1-8b-128k slow generation.

Open 3Simplex opened this issue 6 months ago • 12 comments

[!NOTE] Until this is fixed the workaround is use the CPU or CUDA instead.

Bug Report

Vulkan: Meta-Llama-3.1-8b-128k slow generation.

When using release 3.1.1 and Vulkan the Meta-Llama-3.1-8b-128k is extremely slow. (1.5t/s) This is not a problem on CPU.

Steps to Reproduce

  1. Using GPT4All 3.1.1 with Vulkan
  2. Chat with Meta-Llama-3.1-8b-128k
  3. Speed Is immediately slow (1.5t/s)

Expected Behavior

Using the model with llama.cpp directly reports over 60t/s Using the model with GPT4All before 3.1.1 I could get about 30t/s.

Your Environment

  • GPT4All version: 3.1.1 (release or web_beta)
  • Operating System: Windows
  • Chat model used (if applicable): Vulkan & Meta-Llama-3.1-8b-128k

3Simplex avatar Jul 29 '24 16:07 3Simplex