sycl : Implemented reorder Q4_K mmvq
This PR enables reorder optimization for Q4_K layout similarly to https://github.com/ggml-org/llama.cpp/pull/12858 . This branch is based off of @Alcpz 's and before that is merged the easiest way to review it is looking at the diff for d1f5b2d740970c22de4a1cba7e0df0de0739831e .
Some performance numbers below:
Lunar lake
-
GGML_SYCL_DISABLE_OPT=0
| model | size | params | backend | ngl | threads | sm | test | t/s |
|---|---|---|---|---|---|---|---|---|
| qwen2 1.5B Q4_K - Medium | 1.04 GiB | 1.78 B | SYCL | 99 | 8 | none | pp512 | 1593.59 ± 79.66 |
| qwen2 1.5B Q4_K - Medium | 1.04 GiB | 1.78 B | SYCL | 99 | 8 | none | tg128 | 41.43 ± 0.49 |
| llama 7B Q4_K - Medium | 3.80 GiB | 6.74 B | SYCL | 99 | 8 | none | pp512 | 551.60 ± 2.19 |
| llama 7B Q4_K - Medium | 3.80 GiB | 6.74 B | SYCL | 99 | 8 | none | tg128 | 17.69 ± 1.04 |
| gemma2 2B Q4_K - Medium | 1.59 GiB | 2.61 B | SYCL | 99 | 8 | none | pp512 | 590.18 ± 4.57 |
| gemma2 2B Q4_K - Medium | 1.59 GiB | 2.61 B | SYCL | 99 | 8 | none | tg128 | 28.36 ± 0.24 |
| llama 8B Q4_K - Medium | 4.58 GiB | 8.03 B | SYCL | 99 | 8 | none | pp512 | 507.64 ± 0.92 |
| llama 8B Q4_K - Medium | 4.58 GiB | 8.03 B | SYCL | 99 | 8 | none | tg128 | 13.61 ± 0.07 |
| phi3 3B Q4_K - Medium | 2.23 GiB | 3.82 B | SYCL | 99 | 8 | none | pp512 | 823.78 ± 30.18 |
| phi3 3B Q4_K - Medium | 2.23 GiB | 3.82 B | SYCL | 99 | 8 | none | tg128 | 21.44 ± 0.08 |
build: 105a01d7 (5223)
-
GGML_SYCL_DISABLE_OPT=1
| model | size | params | backend | ngl | threads | sm | test | t/s |
|---|---|---|---|---|---|---|---|---|
| qwen2 1.5B Q4_K - Medium | 1.04 GiB | 1.78 B | SYCL | 99 | 8 | none | pp512 | 1624.32 ± 64.90 |
| qwen2 1.5B Q4_K - Medium | 1.04 GiB | 1.78 B | SYCL | 99 | 8 | none | tg128 | 36.27 ± 0.25 |
| llama 7B Q4_K - Medium | 3.80 GiB | 6.74 B | SYCL | 99 | 8 | none | pp512 | 552.24 ± 1.20 |
| llama 7B Q4_K - Medium | 3.80 GiB | 6.74 B | SYCL | 99 | 8 | none | tg128 | 12.83 ± 1.24 |
| gemma2 2B Q4_K - Medium | 1.59 GiB | 2.61 B | SYCL | 99 | 8 | none | pp512 | 623.69 ± 3.50 |
| gemma2 2B Q4_K - Medium | 1.59 GiB | 2.61 B | SYCL | 99 | 8 | none | tg128 | 24.23 ± 0.58 |
| llama 8B Q4_K - Medium | 4.58 GiB | 8.03 B | SYCL | 99 | 8 | none | pp512 | 508.55 ± 1.01 |
| llama 8B Q4_K - Medium | 4.58 GiB | 8.03 B | SYCL | 99 | 8 | none | tg128 | 10.21 ± 0.03 |
| phi3 3B Q4_K - Medium | 2.23 GiB | 3.82 B | SYCL | 99 | 8 | none | pp512 | 820.33 ± 30.67 |
| phi3 3B Q4_K - Medium | 2.23 GiB | 3.82 B | SYCL | 99 | 8 | none | tg128 | 17.72 ± 0.06 |
build: 105a01d7 (5223)
Arc B580 (Battlemage)
-
GGML_SYCL_DISABLE_OPT=0
| model | size | params | backend | ngl | sm | test | t/s |
|---|---|---|---|---|---|---|---|
| qwen2 1.5B Q4_K - Medium | 1.04 GiB | 1.78 B | SYCL | 99 | none | pp512 | 7963.47 ± 49.91 |
| qwen2 1.5B Q4_K - Medium | 1.04 GiB | 1.78 B | SYCL | 99 | none | tg128 | 119.66 ± 1.24 |
| llama 7B Q4_K - Medium | 3.80 GiB | 6.74 B | SYCL | 99 | none | pp512 | 2251.25 ± 3.16 |
| llama 7B Q4_K - Medium | 3.80 GiB | 6.74 B | SYCL | 99 | none | tg128 | 53.63 ± 0.51 |
| gemma2 2B Q4_K - Medium | 1.59 GiB | 2.61 B | SYCL | 99 | none | pp512 | 5899.09 ± 16.46 |
| gemma2 2B Q4_K - Medium | 1.59 GiB | 2.61 B | SYCL | 99 | none | tg128 | 87.05 ± 2.77 |
| llama 8B Q4_K - Medium | 4.58 GiB | 8.03 B | SYCL | 99 | none | pp512 | 2116.96 ± 3.79 |
| llama 8B Q4_K - Medium | 4.58 GiB | 8.03 B | SYCL | 99 | none | tg128 | 47.78 ± 0.32 |
| phi3 3B Q4_K - Medium | 2.23 GiB | 3.82 B | SYCL | 99 | none | pp512 | 3247.42 ± 3.66 |
| phi3 3B Q4_K - Medium | 2.23 GiB | 3.82 B | SYCL | 99 | none | tg128 | 66.47 ± 0.62 |
build: 105a01d7 (5223)
-
GGML_SYCL_DISABLE_OPT=1
| model | size | params | backend | ngl | sm | test | t/s |
|---|---|---|---|---|---|---|---|
| qwen2 1.5B Q4_K - Medium | 1.04 GiB | 1.78 B | SYCL | 99 | none | pp512 | 7900.28 ± 61.92 |
| qwen2 1.5B Q4_K - Medium | 1.04 GiB | 1.78 B | SYCL | 99 | none | tg128 | 100.15 ± 3.03 |
| llama 7B Q4_K - Medium | 3.80 GiB | 6.74 B | SYCL | 99 | none | pp512 | 2250.62 ± 2.25 |
| llama 7B Q4_K - Medium | 3.80 GiB | 6.74 B | SYCL | 99 | none | tg128 | 38.05 ± 0.25 |
| gemma2 2B Q4_K - Medium | 1.59 GiB | 2.61 B | SYCL | 99 | none | pp512 | 5925.76 ± 9.85 |
| gemma2 2B Q4_K - Medium | 1.59 GiB | 2.61 B | SYCL | 99 | none | tg128 | 71.27 ± 0.16 |
| llama 8B Q4_K - Medium | 4.58 GiB | 8.03 B | SYCL | 99 | none | pp512 | 2114.17 ± 3.93 |
| llama 8B Q4_K - Medium | 4.58 GiB | 8.03 B | SYCL | 99 | none | tg128 | 34.39 ± 0.10 |
| phi3 3B Q4_K - Medium | 2.23 GiB | 3.82 B | SYCL | 99 | none | pp512 | 3265.26 ± 6.07 |
| phi3 3B Q4_K - Medium | 2.23 GiB | 3.82 B | SYCL | 99 | none | tg128 | 54.89 ± 0.55 |
build: 105a01d7 (5223)
Arc A770
-
GGML_SYCL_DISABLE_OPT=0
| model | size | params | backend | ngl | sm | test | t/s |
|---|---|---|---|---|---|---|---|
| qwen2 1.5B Q4_K - Medium | 1.04 GiB | 1.78 B | SYCL | 99 | none | pp512 | 4540.38 ± 8.00 |
| qwen2 1.5B Q4_K - Medium | 1.04 GiB | 1.78 B | SYCL | 99 | none | tg128 | 44.47 ± 0.15 |
| llama 7B Q4_K - Medium | 3.80 GiB | 6.74 B | SYCL | 99 | none | pp512 | 1753.07 ± 2.08 |
| llama 7B Q4_K - Medium | 3.80 GiB | 6.74 B | SYCL | 99 | none | tg128 | 32.04 ± 0.22 |
| gemma2 2B Q4_K - Medium | 1.59 GiB | 2.61 B | SYCL | 99 | none | pp512 | 3785.29 ± 6.46 |
| gemma2 2B Q4_K - Medium | 1.59 GiB | 2.61 B | SYCL | 99 | none | tg128 | 38.65 ± 0.33 |
| llama 8B Q4_K - Medium | 4.58 GiB | 8.03 B | SYCL | 99 | none | pp512 | 1702.11 ± 2.83 |
| llama 8B Q4_K - Medium | 4.58 GiB | 8.03 B | SYCL | 99 | none | tg128 | 29.26 ± 0.07 |
| phi3 3B Q4_K - Medium | 2.23 GiB | 3.82 B | SYCL | 99 | none | pp512 | 2534.60 ± 0.94 |
| phi3 3B Q4_K - Medium | 2.23 GiB | 3.82 B | SYCL | 99 | none | tg128 | 34.11 ± 0.32 |
build: 105a01d7 (5223)
-
GGML_SYCL_DISABLE_OPT=1
| model | size | params | backend | ngl | sm | test | t/s |
|---|---|---|---|---|---|---|---|
| qwen2 1.5B Q4_K - Medium | 1.04 GiB | 1.78 B | SYCL | 99 | none | pp512 | 4532.79 ± 9.10 |
| qwen2 1.5B Q4_K - Medium | 1.04 GiB | 1.78 B | SYCL | 99 | none | tg128 | 44.17 ± 0.39 |
| llama 7B Q4_K - Medium | 3.80 GiB | 6.74 B | SYCL | 99 | none | pp512 | 1749.38 ± 2.50 |
| llama 7B Q4_K - Medium | 3.80 GiB | 6.74 B | SYCL | 99 | none | tg128 | 26.03 ± 0.02 |
| gemma2 2B Q4_K - Medium | 1.59 GiB | 2.61 B | SYCL | 99 | none | pp512 | 3774.80 ± 2.61 |
| gemma2 2B Q4_K - Medium | 1.59 GiB | 2.61 B | SYCL | 99 | none | tg128 | 35.51 ± 0.08 |
| llama 8B Q4_K - Medium | 4.58 GiB | 8.03 B | SYCL | 99 | none | pp512 | 1702.25 ± 1.93 |
| llama 8B Q4_K - Medium | 4.58 GiB | 8.03 B | SYCL | 99 | none | tg128 | 23.40 ± 0.23 |
| phi3 3B Q4_K - Medium | 2.23 GiB | 3.82 B | SYCL | 99 | none | pp512 | 2535.88 ± 3.86 |
| phi3 3B Q4_K - Medium | 2.23 GiB | 3.82 B | SYCL | 99 | none | tg128 | 30.36 ± 0.39 |
build: 105a01d7 (5223)
@sgeor255 Here is a discussion about Q4_K. https://github.com/ggml-org/llama.cpp/discussions/13120#discussioncomment-12957458 Could you test the model by this PR? If result is good, could you reply with your test result?
We need promote SYCL backend in related cases. :)
I rebased the PR on @Alcpz 's latest changes & updated the description with more performance numbers.
@NeoZhangJianyu to answer your questions:
- Could you share the GPU type of above test result?
I updated the PR description with results from more devices.
- Have you test the PR by local UT?
Unit tests pass locally (if I understood the question correctly?).
- Could you check the detailed output of Q4_K LLM? I guess the output should be different to legacy code.
I ran the example script with Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf, output below:
master @ 8936784f7a1ec4f91637d04b77fdc90ec36ebac9
sampler seed: 0
sampler params:
repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000
dry_multiplier = 0.000, dry_base = 1.750, dry_allowed_length = 2, dry_penalty_last_n = 4096
top_k = 40, top_p = 0.950, min_p = 0.050, xtc_probability = 0.000, xtc_threshold = 0.100, typical_p = 1.000, top_n_sigma = -1.000, temp = 0.800
mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
sampler chain: logits -> logit-bias -> penalties -> dry -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist
generate: n_ctx = 4096, n_batch = 2048, n_predict = 400, n_keep = 0
user
Building a website can be done in 10 simple steps:
Step 1:assistant
Here are the 10 simple steps to build a website:
**Step 1: Plan Your Website**
Define the purpose, target audience, and goals of your website. Determine what type of content you will feature, and what features you want to include (e.g., e-commerce, blog, contact form).
**Step 2: Choose a Domain Name**
Register a unique and memorable domain name that reflects your website's identity and is easy to spell. Make sure to check if the name is available and not already taken by someone else.
**Step 3: Select a Web Host**
Choose a reliable web hosting service that meets your website's needs in terms of storage, bandwidth, and technical support. Some popular options include Bluehost, HostGator, and SiteGround.
**Step 4: Design Your Website**
Use a website builder tool or a content management system (CMS) like WordPress to design and layout your website. Choose a theme or template that is responsive and user-friendly.
**Step 5: Add Content**
Populate your website with high-quality content, including text, images, videos, and other multimedia elements. Make sure to optimize your content for search engines (SEO).
**Step 6: Install a CMS (Optional)**
If you want to have more control over your website's design and functionality, install a CMS like WordPress, Joomla, or Drupal.
**Step 7: Set Up Navigation**
Create a logical and intuitive navigation menu that allows visitors to easily find and access different parts of your website.
**Step 8: Add Features and Functionality**
Add features and functionality to your website, such as contact forms, email newsletters, and e-commerce functionality (if applicable).
**Step 9: Test and Launch**
Test your website thoroughly to ensure that it is stable, secure, and functions as intended. Launch your website and make it available to the public.
**Step 10: Maintain and Update**
Regularly update your website's content, plugins, and
This PR
sampler seed: 0
sampler params:
repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000
dry_multiplier = 0.000, dry_base = 1.750, dry_allowed_length = 2, dry_penalty_last_n = 4096
top_k = 40, top_p = 0.950, min_p = 0.050, xtc_probability = 0.000, xtc_threshold = 0.100, typical_p = 1.000, top_n_sigma = -1.000, temp = 0.800
mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
sampler chain: logits -> logit-bias -> penalties -> dry -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist
generate: n_ctx = 4096, n_batch = 2048, n_predict = 400, n_keep = 0
user
Building a website can be done in 10 simple steps:
Step 1:assistant
Here are the 10 simple steps to build a website:
**Step 1: Plan Your Website**
Define the purpose, target audience, and goals of your website. Determine what type of content you will feature, and what features you want to include (e.g., e-commerce, blog, contact form).
**Step 2: Choose a Domain Name**
Register a unique and memorable domain name that reflects your website's identity. Ensure it's easy to spell and remember, and consider the extension (e.g., .com, .net, .io).
**Step 3: Select a Web Hosting Service**
Choose a reliable web hosting service that meets your needs (e.g., bandwidth, storage, customer support). Consider factors like uptime, security, and scalability.
**Step 4: Plan Your Content**
Develop a content strategy that includes writing engaging articles, creating high-quality images, and planning a content calendar.
**Step 5: Design Your Website**
Create a visually appealing and user-friendly website design using a website builder, design software, or by hiring a professional designer.
**Step 6: Choose a Content Management System (CMS)**
Select a CMS like WordPress, Joomla, or Drupal that suits your needs and allows for easy content management.
**Step 7: Install and Customize Your CMS**
Install the CMS and customize it to your liking using themes, plugins, and widgets.
**Step 8: Create and Add Content**
Write and publish engaging content, add images and multimedia, and optimize it for search engines.
**Step 9: Test and Launch**
Test your website for bugs, usability issues, and performance. Launch your website and make any final adjustments.
**Step 10: Maintain and Update**
Regularly update your website with fresh content, fix bugs, and keep your CMS and plugins up-to-date to ensure a smooth user experience and maintain search engine rankings.
Let me know if you'd like me to expand on any of these steps!
@sgeor255 I cannot resolve my comments (because the "resolve conversation " button is just isn't there for me), consider them resolved 👍🏻
@sgeor255 Here is a discussion about Q4_K. #13120 (reply in thread) Could you test the model by this PR? If result is good, could you reply with your test result?
We need promote SYCL backend in related cases. :)
https://github.com/NeoZhangJianyu there's a small improvement for this model too:
-
GGML_SYCL_DISABLE_OPT=0
| model | size | params | backend | ngl | threads | sm | test | t/s |
|---|---|---|---|---|---|---|---|---|
| qwen2 7B Q4_K - Medium | 4.36 GiB | 7.62 B | SYCL | 99 | 8 | none | pp512 | 3681.78 ± 24.68 |
| qwen2 7B Q4_K - Medium | 4.36 GiB | 7.62 B | SYCL | 99 | 8 | none | tg128 | 62.10 ± 0.27 |
build: 105a01d7 (5223)
-
GGML_SYCL_DISABLE_OPT=1
| model | size | params | backend | ngl | threads | sm | test | t/s |
|---|---|---|---|---|---|---|---|---|
| qwen2 7B Q4_K - Medium | 4.36 GiB | 7.62 B | SYCL | 99 | 8 | none | pp512 | 3721.85 ± 16.25 |
| qwen2 7B Q4_K - Medium | 4.36 GiB | 7.62 B | SYCL | 99 | 8 | none | tg128 | 45.49 ± 0.16 |
build: 105a01d7 (5223)
This PR is now rebased on master as #12858 was merged.
I find the refer PR https://github.com/ggml-org/llama.cpp/pull/12858 has performance and wrong result issue. Please hope this PR, until the https://github.com/ggml-org/llama.cpp/pull/12858 is confirmed.
Merging now since this PR includes an important fix with the reorder optimization mentioned here: https://github.com/ggml-org/llama.cpp/pull/13109#discussion_r2073187875 I think the major concerns have been answered.