nunchaku icon indicating copy to clipboard operation
nunchaku copied to clipboard

locating code for activation quantization with group size 64?

Open unbelievable3513 opened this issue 8 months ago • 0 comments

Hey there! I've got a super simple question after doing a ton of code searching. Which code can show that the activation's quantization is based on a per group size of 64? Since I learned from quantize_w4a4_from_fpsum_warp(https://github.com/mit-han-lab/nunchaku/blob/main/src/kernels/zgemm/gemm_w4a4.cuh#L460) that input[2][8] (28half2_t) are statistics for 2 scales. That means every input[x] (16 half elements) per line get one single scale. Then it's saved to output_scale, and loaded by q_act.scales (I already know that IN_FEATURE_PAD/64 is allocated for it). 😊 express my profound gratitude.

unbelievable3513 avatar Mar 25 '25 08:03 unbelievable3513