ncnn icon indicating copy to clipboard operation
ncnn copied to clipboard

[WIP] gemm block quantization for llm decoder style

Open nihui opened this issue 3 months ago • 3 comments

  • [ ] move int468 dequant code to x86/arm/... create_pipeline
  • [ ] how to encode block_size and storage type ?
  • [ ] try fp4 e2m1/e3 type ?
  • [ ] comp table ?
  • [ ] expand to more general gemm ?
  • [ ] port union hack to platform-independent style
  • [ ] gemm test++
  • [ ] doc++
./ncnnllm2int468 qwen3_decoder.ncnn.param qwen3_decoder.ncnn.bin qwen3_decoder-int6.ncnn.param qwen3_decoder-int6.ncnn.bin

nihui avatar Dec 04 '25 11:12 nihui

CLA assistant check
Thank you for your submission, we really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

tencent-adm avatar Dec 04 '25 11:12 tencent-adm

Codecov Report

:x: Patch coverage is 7.52688% with 86 lines in your changes missing coverage. Please review. :white_check_mark: Project coverage is 95.88%. Comparing base (37336e7) to head (58cc1f3). :warning: Report is 2 commits behind head on master.

Files with missing lines Patch % Lines
src/layer/gemm.cpp 7.52% 86 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6439      +/-   ##
==========================================
+ Coverage   95.62%   95.88%   +0.26%     
==========================================
  Files         844      844              
  Lines      266761   266834      +73     
==========================================
+ Hits       255080   255859     +779     
+ Misses      11681    10975     -706     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov-commenter avatar Dec 04 '25 11:12 codecov-commenter

The binary size change of libncnn.so (bytes)

architecture base size pr size difference
x86_64 15316400 15324592 +8192 :warning:
armhf 6229892 6234020 +4128 :warning:
aarch64 9527616 9527536 -80 :kissing_heart:

github-actions[bot] avatar Dec 04 '25 11:12 github-actions[bot]