[WIP] gemm block quantization for llm decoder style
- [ ] move int468 dequant code to x86/arm/... create_pipeline
- [ ] how to encode block_size and storage type ?
- [ ] try fp4 e2m1/e3 type ?
- [ ] comp table ?
- [ ] expand to more general gemm ?
- [ ] port union hack to platform-independent style
- [ ] gemm test++
- [ ] doc++
./ncnnllm2int468 qwen3_decoder.ncnn.param qwen3_decoder.ncnn.bin qwen3_decoder-int6.ncnn.param qwen3_decoder-int6.ncnn.bin
Thank you for your submission, we really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.
Codecov Report
:x: Patch coverage is 7.52688% with 86 lines in your changes missing coverage. Please review.
:white_check_mark: Project coverage is 95.88%. Comparing base (37336e7) to head (58cc1f3).
:warning: Report is 2 commits behind head on master.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| src/layer/gemm.cpp | 7.52% | 86 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## master #6439 +/- ##
==========================================
+ Coverage 95.62% 95.88% +0.26%
==========================================
Files 844 844
Lines 266761 266834 +73
==========================================
+ Hits 255080 255859 +779
+ Misses 11681 10975 -706
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
The binary size change of libncnn.so (bytes)
| architecture | base size | pr size | difference |
|---|---|---|---|
| x86_64 | 15316400 | 15324592 | +8192 :warning: |
| armhf | 6229892 | 6234020 | +4128 :warning: |
| aarch64 | 9527616 | 9527536 | -80 :kissing_heart: |