gpt4all [Feature] Vulkan acceleration for more quantization types

[Feature] Vulkan acceleration for more quantization types

Open TiagoSantos81 opened this issue 9 months ago • 3 comments

System Info

GPT4ALL 2.5.0 desktop version on Windows 10 x64. Two systems, both with NVidia GPUs.

Information

[ ] The official example notebooks/scripts
[ ] My own modified scripts

Reproduction

Load any Mistral base model with 4_0 quantization, as the default models on GPT4ALL Chat, on a GPU with more than 6GB free memory.
Change default pre-load models to an equivalent model with 3_K_M (smaller), and restart the application (due to #1550).
Run a short prompt and check below the speed metrics if the models was loaded to the GPU, on in any memory profiling app.

Expected behavior

Smaller K_M quantized GGUF models should fit the same GPU as 4_0 and 5_0 ones.

Oct 22 '23 20:10 TiagoSantos81

Currently, GPU offloading is only supported for models based on the LLaMA for Falcon architecture stored in the Q4_0, Q4_1, fp16, or fp32 formats. If you attempt to load an unsupported model, there should be a message in the lower-right corner while it is generating that it is using the CPU due to an unsupported model type/format.

Oct 22 '23 20:10 cebtenzzre

@cebtenzzre, Hello, that does not seem to be the case at least for version 2.5.0:

nov. 02 11:56:41 HOST plasmashell[280926]: ggml_vk_graph_compute: MUL_MAT: Unsupported quantization: 13/0 nov. 02 11:56:41 HOST plasmashell[280926]: ggml_vk_graph_compute: node 942, op = MUL_MAT not implemented

Looks like it's the same result for any quantized model, even LLAMA ones.

Nov 02 '23 10:11 DistantThunder

that does not seem to be the case at least for version 2.5.0:

There is a bug in the detection of unsupported quantizations that was fixed in https://github.com/nomic-ai/llama.cpp/pull/11 and should be resolved in the next release.

Nov 02 '23 16:11 cebtenzzre

gpt4all gpt4all copied to clipboard

[Feature] Vulkan acceleration for more quantization types

System Info

Information

Reproduction

Expected behavior

gpt4all
gpt4all copied to clipboard