llama.cpp imatrix: add option to display importance score statistics for a given imatrix file

A new --show-statistics option generates a report highlighting which tensors/layers contribute the most in a model. The report is sorted from the highest influence to lowest. The process computes the average value of scores per tensor/layer and calculates their % contribution, exiting immediately after completion.

This PR can be used along with quantize: Handle user-defined quantization levels for additional tensors to do layer-wise quantization similar, but not quite the same, to the process described in Layer-Wise Quantization: A Pragmatic and Effective Method for Quantizing LLMs Beyond Integer Bit-Levels

Output example:

llama-imatrix --in-file imatrix-DeepSeek-R1-Distill-Llama-8B-small.dat --show-statistics

Computing statistics for imatrix-DeepSeek-R1-Distill-Llama-8B-small.dat (225 tensors)

 Layer	               Tensor	          μ(Importance Scores)	   Contribution
================================================================================
    -	                        output	        5523.92	             13.9226 %
   27	                        attn_v	         356.58	              0.8987 %
   27	                        attn_k	         356.58	              0.8987 %
   27	                        attn_q	         356.58	              0.8987 %
   24	                        attn_k	         347.19	              0.8751 %
   24	                        attn_q	         347.19	              0.8751 %
   24	                        attn_v	         347.19	              0.8751 %
   25	                        attn_q	         346.77	              0.8740 %
   25	                        attn_k	         346.77	              0.8740 %
   25	                        attn_v	         346.77	              0.8740 %
   29	                        attn_v	         344.46	              0.8682 %
...
   0	                      ffn_down	           0.09	              0.0002 %

Apr 02 '25 13:04 EAddario

Nice idea, seems like something we discuss the last time? @bartowski1182

Btw is it possible to show importance score from an existing imatrix file @EAddario ?

Apr 02 '25 14:04 ngxson

Thank you @ngxson. Yes, it will process any imatrix file produced by llama-imatrix, but it is restricted to single file (does not deal with multiple --in-file)

Apr 02 '25 17:04 EAddario

Isn't this just related to the hidden state norms getting larger as you move through the different layers? If so, then it won't really account for the accumulation of errors caused by an early layer on the final output?

Apr 03 '25 17:04 jukofyork

Not sure if I'm understanding the comment correctly @jukofyork, but the logic I'm using to identify the most influential tensors/layers is to simply average the importance scores (IS) for each, add those averages together, and then compute their individual contributions from the total.

The logic llama-imatrix uses to calculate the IS is to square the value of the corresponding weight during inference, keep a running total of how many times that particular value has been updated, and then save the average when inference has finished.

This only applies to 2d or larger tensors, so it will ignore norms (1d), but since errors influence which weights get updated (and how frequently), the IS does account for errors, albeit indirectly.

Make sense?

Apr 06 '25 13:04 EAddario

Not sure if I'm understanding the comment correctly @jukofyork, but the logic I'm using to identify the most influential tensors/layers is to simply average the importance scores (IS) for each, add those averages together, and then compute their individual contributions from the total.

@EAddario

I think the mean squared activations (which would be their variance assuming a mean of 0) cannot really be compared across tensors without some kind of normalization, because the values of the model weights can also affect the relative importance of the activations. (llama-imatrix calculates the sum of squared activations and their count, it doesn't directly take into account the model weights; it's only when quantizing that they are taken into account (and even then it depends on the type))

The goal here is to find which layers need more precision, right?

I'm not sure if the mean squared activations really are what you're looking for.

There might be other measures like skewness and kurtosis which may be useful. But I'm not sure if taking only the activations into account is the right way to get the insights you seek.

What I'd like to try eventually would be to use a simultaneous quantization algorithm to try multiple bit-widths at once in a reasonable amount of time so that the errors can be compared per tensor to help with the choice of quantization type.

This would be possible for x[i] ≈ q[i] * s types using a cumulative search similar to #12557, but I don't know how to do that with x[i] ≈ q[i] * s - m types yet.

I still think it can be useful to have some way to visualize what is in imatrix files and/or the distribution of the activations. But not all the necessary information is kept in imatrix files, only the per-channel sum of squared activations, which is a bit limiting for this purpose. Adding more measures (like the mean, skewness and kurtosis, either per-tensor or per-channel) in the file would be easier after #9400.

In the paper you link (https://arxiv.org/pdf/2406.17415), the closest thing to what you propose would be the LIM (layer input modification) score, which is calculated as follows (in Section 3.1), where $L_i$ is the i-th layer, and $L_i^I$ are the input activations and $L_i^O$ the corresponding output activations:

$$ LIM(L_i) = -\frac{L_i^I \cdot L_i^O}{\left|L_i^I\right| \left|L_i^O\right|} $$

llama-imatrix technically has access to both the input and output activations of a layer, but only uses its input.

Apr 06 '25 23:04 compilade

Very clear now, thanks @compilade. You're correct, I'm using the mean squared activation averaged to identify which tensors/layers produce large magnitude activations and ~~whilst~~ agree it isn't as accurate as, say, correlation / covariance / LIM ~~I think it's still a reasonable proxy, specially considering how the importance scores are actually used during quantization (quant_weights in ggml-quants.c)~~

I had a quick look at your PRs. I definitely like the idea of storing imatrix data in GGUF format and can appreciate how it would improve the generation of these types of stats. #12557 is quite intriguing, but truth be told I haven't had a chance to really digest it fully (there's a lot going on!) but would love to see it merged specially if it improves ternary quants

Apr 07 '25 22:04 EAddario

Had a chance to think this more thoroughly and now get the implications of @jukofyork and @compilade's comments. Agree my current approach is not really identifying influence but rather score "growth". Back to the drawing table 😆

Apr 08 '25 10:04 EAddario

Had a chance to think this more thoroughly and now get the implications of @jukofyork and @compilade's comments. Agree my current approach is not really identifying influence but rather score "growth". Back to the drawing table 😆

I can help you with this, but it will need a fair bit of compute to calculate. I've not got time to explain fully but basically:

Decide on what you are optimising: L2-error in the final hidden-state, perplexity (ie: "wellcalibratedness" of the top choice), KL-divergence (ie: "wellcalibratedness" of the full probability distribution), earth-movers-distance, hinge-loss, or whatever.
Use some form of (2-sided) Finite-Differences to to estimate the gradient of the loss you are optimising with respect to moving up/down 1 bit of quant for a given parameter group (eg: layer-based or tensor-based groupings).

You will likely have to transform the loss measure somehow:

Perplexity is actually just a transformed version of negative log-loss, as is McFadden's Pseudo-R-squared and a whole host of different domain-specific measures of "wellcalibratedness". The fact people often plot the log-PPL suggests this is not a good metric to use for this...
The real thing you are measuring is "bits" (in the Information Theory sense; not the normal colloquial term) and negative-log-loss has a nice interpretation for this (the late David MacKay's book Information Theory, Inference, and Learning Algorithms is an amazing read to see the links if you are more interested in this!).

Assuming Finite-Differences is too costly to perform, then then you can use a stochastic approximation (FDSA) or its extension SPSA to estimate the gradients using whatever compute you can muster up.

Apr 08 '25 11:04 jukofyork

I've edited the post above quite a lot so should hopefully make more sense (in case you're reading from the email notification).

Apr 08 '25 11:04 jukofyork

Thank you, now I know what I'm doing over the weekend 😁

On a serious note, much appreciated @jukofyork. Plenty of food for thought. I'll give it proper consideration

Apr 08 '25 21:04 EAddario

Thank you, now I know what I'm doing over the weekend 😁

On a serious note, much appreciated @jukofyork. Plenty of food for thought. I'll give it proper consideration

No problem and just remember the most important thing to figure out is exactly what you are optimising first! There are actually a lot of compelling options for this; each with their own reasons for and against... All have different costs to compute too:

Metrics using the full probability distribution like KL-divergence or earth-movers distance are the most expensive.
Then metrics that need a probability and have to pass through softmax are next.
Then metrics that require multiplication with lm_head (which in modern models can be >> hidden_dim!) are next.
Metrics involving the final hidden state are the cheapest.

Apr 08 '25 21:04 jukofyork

Following from @jukofyork and @compilade's remarks and suggestions, I've made some changes in my approach.

To set the context, and explain exactly what's the problem I'm trying to solve, I have two objectives in mind:

find a way to identify and rank which tensors/layers are most influential during inference, and
implement changes in a 100% backwards compatible way. That is, they must work with any imatrix file already generated.

The direct implication of constraint "2" is no changes to IMatrixCollector::collect_imatrix, meaning that "1" has to rely solely on the importance scores (IS) stored in imatrix files, without access to the underlying weights.

As noted by @compilade, IS "...cannot really be compared across tensors without some kind of normalization, because the values of the model weights can also affect the relative importance of the activations..." however, IS are a direct measurement of how active a particular weight was during inference, based on a given input prompt (more on this later), and therefore can be used as a (arguably suboptimal) proxy for "influence" but instead of relying on the average, a better metric is to use the sum of IS per tensor/layer (the higher the number, the "busier" the tensor/layer and the more it contributes to upstream computations).

Although there are better metrics (e.g. gradient of loss, covariance, LIM, etc.), those would require changes to the imatrix collection process, which is beyond the scope of what I'm trying to do, at least for now. Having said that, it's worth keeping an eye on the work @ubergarm is doing in WIP Compute per layer LIM Scores during imatrix

Tests performed during quantization of DeepSeek-R1-Distill-Qwen-7B seem to confirm that Σ(Bias), which is what I'm calling the sum of IS per tensor, is a good influence indicator as it can be seen in the table below, where (↑) represents quantizing half of the most influential tensors (as per Σ(Bias)) at a higher bit level, and (↓) represents quantizing half of the least influential tensors at a higher bit level:

Model	μPPL (↑)	𝜌PPL (↑)	μKLD (↑)	RMS Δp (↑)	μPPL (↓)	𝜌PPL (↓)	μKLD (↓)	RMS Δp (↓)
IQ3_M	28.740047 ±0.291290	97.19%	0.229742 ±0.000770	11.793 ±0.050	28.721610 ±0.288684	96.94%	0.249550 ±0.000841	12.332 ±0.053
IQ3_S	30.290800 ±0.307742	96.32%	0.310982 ±0.001014	13.415 ±0.057	31.315997 ±0.316217	95.95%	0.341996 ±0.001082	14.292 ±0.058
IQ4_NL	23.570503 ±0.226124	98.59%	0.102854 ±0.000465	8.080 ±0.046	23.862907 ±0.226366	98.51%	0.117131 ±0.000395	8.560 ±0.040
Q3_K_L	24.160705 ±0.229989	97.75%	0.173336 ±0.000603	10.337 ±0.048	24.853047 ±0.240164	97.56%	0.195060 ±0.000681	10.801 ±0.050
Q3_K_M	24.967196 ±0.239198	97.50%	0.194299 ±0.000681	10.877 ±0.050	25.212714 ±0.244888	97.31%	0.214337 ±0.000747	11.278 ±0.052
Q3_K_S	25.661098 ±0.246635	96.84%	0.243850 ±0.000852	12.143 ±0.054	25.916397 ±0.250857	96.60%	0.270237 ±0.000928	12.737 ±0.057
Q4_K_M	23.125382 ±0.221860	99.24%	0.053997 ±0.000215	5.795 ±0.032	23.283282 ±0.223537	99.13%	0.065186 ±0.000241	6.273 ±0.034
Q4_K_S	23.156199 ±0.222000	99.18%	0.058337 ±0.000233	6.026 ±0.034	23.263445 ±0.223330	99.08%	0.069488 ±0.000261	6.429 ±0.035
Q5_K_M	22.726887 ±0.217691	99.75%	0.013562 ±0.000062	2.924 ±0.020	22.903038 ±0.220259	99.72%	0.015792 ±0.000063	3.114 ±0.019
Q5_K_S	22.766826 ±0.218244	99.74%	0.014589 ±0.000070	3.024 ±0.020	22.892603 ±0.220059	99.71%	0.017023 ±0.000073	3.231 ±0.020
Q6_K	22.859294 ±0.219461	99.87%	0.004317 ±0.000022	1.682 ±0.016	22.847118 ±0.219384	99.86%	0.004950 ±0.000021	1.767 ±0.012
Q8_0	22.840693 ±0.219408	99.90%	0.001614 ±0.000011	1.050 ±0.010	22.832647 ±0.219310	99.90%	0.001830 ±0.000024	1.110 ±0.016

For reference, compared to the naive Q4_K_M model, the layer-wised quantized is 10.7% smaller (4.68GB vs 4.18GB) with only a 0.35% penalty on μPPL:

Model	μPPL	𝜌PPL	μKLD	RMS Δp
Q4_K_M	22.936432 ±0.220488	99.59%	0.026917 ±0.000105	4.100 ±0.024

Whilst I was considering @jukofyork's feedback, I came to think of how much the benefit of using an imatrix is dependent on the quality of the prompt used during its generation, and how difficult it's to determine how well a given prompt "exercises" all of the model's capabilities, so I added additional statistics to help in that regard.

As things stand at the moment, --show-statistics now produce the following statistics:

Σ(Bias): the sum of all squared activations across the tensor (i.e. the Importance Scores) Min & Max: minimum and maximum activation values μ & σ: Activation's Mean and Standard Deviation % Active: proportion of elements whose average activation exceeds a very small threshold (1e-6). Helpful to determine how alive/dormant the tensor is during inference N: number of activations in the tensor Entropy: entropy of the activation distribution, in bits (standard Shannon entropy measurement) $S = -\sum_{i=1}^N p_i \log_2 p_i$ E (norm): Normalized entropy. $E(norm)=\frac{-\sum_{i=1}^N p_i \log_2 p_i}{log_2 N}$. These two metrics can be used to determine how well a prompt "exercises" the model's capabilities ZD Score: z-score distribution as described in 3.1 Layer Importance Scores in the Layer-Wise Quantization paper

Apr 13 '25 19:04 EAddario

Thanks for the update and defining the statistics gleaned from an existing imatrix.dat file. I pulled your branch and gave it a try on LLaMA-2-13B to compare against the same model used in that Layer-wise Quantization Paper (likely different quantization).

compute imatrix and then show statistics

Compute imatrix

$ git branch | grep '*'
* (HEAD detached at EAddario/imatrix)

$ git rev-parse --short HEAD
200d88c8

$ ./build/bin/llama-imatrix --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3090 Ti, compute capability 8.6, VMM: yes
version: 5136 (200d88c8)
built with cc (GCC) 14.2.1 20250128 for x86_64-pc-linux-gnu

$ ./build/bin/llama-imatrix \
    --verbosity 1 \
    -m /mnt/astrodata/llm/models/TheBloke/Llama-2-13B-chat-GGUF/llama-2-13b-chat.Q8_0.gguf \
    -f wiki.test.raw \
    -o imatrix-wiki-test-llama-2-13b-chat-Q8_0-gguf.dat \
    --ctx-size 512 \
    --threads 16

...
llama_model_loader: - type  f32:   81 tensors
llama_model_loader: - type q8_0:  282 tensors
...
compute_imatrix: tokenizing the input ..
compute_imatrix: tokenization took 397.256 ms
compute_imatrix: computing over 655 chunks with batch_size 512
compute_imatrix: 1.44 seconds per pass - ETA 15.73 minutes
[1]4.8087,[2]5.4272,[3]6.3040,[4]7.0129,[5]7.1984,[6]7.0947,[7]7.2490,[8]7.3314,[9]7.5682,
...
Final estimate: PPL = 6.5257 +/- 0.04210

save_imatrix: stored collected data after 655 chunks in imatrix-wiki-test-llama-2-13b-chat-Q8_0-gguf.dat

llama_perf_context_print:        load time =   22623.39 ms
llama_perf_context_print: prompt eval time =  861807.99 ms / 335360 tokens (    2.57 ms per token,   389.14 tokens per second)
llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
llama_perf_context_print:       total time =  891205.70 ms / 335361 tokens

Show Statistics

$ ./build/bin/llama-imatrix \
    --in-file imatrix-wiki-test-llama-2-13b-chat-Q8_0-gguf.dat \
    --show-statistics

ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3090 Ti, compute capability 8.6, VMM: yes

Computing statistics for imatrix-wiki-test-llama-2-13b-chat-Q8_0-gguf.dat (280 tensors)

Layer	Tensor              	  Σ(Bias)	    Min	         Max	       μ	        σ	 % Active	     N	     Entropy	E (norm)	  ZD Score
==========================================================================================================================================================================
   30	attn_q              	   1321.16	 0.0000	     22.1645	  0.2580	   0.5248	   99.98%	  5120	     11.8988	  96.57%	    5.4688
   30	attn_v              	   1321.16	 0.0000	     22.1645	  0.2580	   0.5248	   99.98%	  5120	     11.8988	  96.57%	    5.4688
   30	attn_k              	   1321.16	 0.0000	     22.1645	  0.2580	   0.5248	   99.98%	  5120	     11.8988	  96.57%	    5.4688
   39	ffn_down            	   1290.84	 0.0042	     29.1379	  0.0934	   0.4147	  100.00%	 13824	     12.1372	  88.24%	   25.9693
   32	attn_v              	   1285.53	 0.0000	     17.6335	  0.2511	   0.4668	   99.98%	  5120	     11.9402	  96.90%	    5.4688
   32	attn_k              	   1285.53	 0.0000	     17.6335	  0.2511	   0.4668	   99.98%	  5120	     11.9402	  96.90%	    5.4688
   32	attn_q              	   1285.53	 0.0000	     17.6335	  0.2511	   0.4668	   99.98%	  5120	     11.9402	  96.90%	    5.4688
   34	attn_q              	   1256.21	 0.0000	     14.0536	  0.2454	   0.4260	   99.98%	  5120	     11.9679	  97.13%	    5.6641
   34	attn_v              	   1256.21	 0.0000	     14.0536	  0.2454	   0.4260	   99.98%	  5120	     11.9679	  97.13%	    5.6641
   34	attn_k              	   1256.21	 0.0000	     14.0536	  0.2454	   0.4260	   99.98%	  5120	     11.9679	  97.13%	    5.6641
   29	attn_k              	   1204.44	 0.0000	     23.4754	  0.2352	   0.5280	   99.98%	  5120	     11.8456	  96.13%	    5.4688
   29	attn_v              	   1204.44	 0.0000	     23.4754	  0.2352	   0.5280	   99.98%	  5120	     11.8456	  96.13%	    5.4688
   29	attn_q              	   1204.44	 0.0000	     23.4754	  0.2352	   0.5280	   99.98%	  5120	     11.8456	  96.13%	    5.4688
   33	attn_q              	   1183.21	 0.0000	     14.3861	  0.2311	   0.3921	   99.98%	  5120	     11.9785	  97.21%	    5.4688
   33	attn_v              	   1183.21	 0.0000	     14.3861	  0.2311	   0.3921	   99.98%	  5120	     11.9785	  97.21%	    5.4688
   33	attn_k              	   1183.21	 0.0000	     14.3861	  0.2311	   0.3921	   99.98%	  5120	     11.9785	  97.21%	    5.4688
   31	attn_k              	   1182.86	 0.0000	     20.5292	  0.2310	   0.4778	   99.98%	  5120	     11.8971	  96.55%	    5.4688
   31	attn_v              	   1182.86	 0.0000	     20.5292	  0.2310	   0.4778	   99.98%	  5120	     11.8971	  96.55%	    5.4688
   31	attn_q              	   1182.86	 0.0000	     20.5292	  0.2310	   0.4778	   99.98%	  5120	     11.8971	  96.55%	    5.4688
   35	attn_k              	   1173.15	 0.0000	     12.3308	  0.2291	   0.3496	   99.98%	  5120	     12.0212	  97.56%	    5.6641
   35	attn_v              	   1173.15	 0.0000	     12.3308	  0.2291	   0.3496	   99.98%	  5120	     12.0212	  97.56%	    5.6641
   35	attn_q              	   1173.15	 0.0000	     12.3308	  0.2291	   0.3496	   99.98%	  5120	     12.0212	  97.56%	    5.6641
   28	attn_v              	   1161.62	 0.0000	     24.2086	  0.2269	   0.5975	   99.98%	  5120	     11.7171	  95.09%	    5.6641
   28	attn_q              	   1161.62	 0.0000	     24.2086	  0.2269	   0.5975	   99.98%	  5120	     11.7171	  95.09%	    5.6641
   28	attn_k              	   1161.62	 0.0000	     24.2086	  0.2269	   0.5975	   99.98%	  5120	     11.7171	  95.09%	    5.6641
   27	attn_q              	   1152.05	 0.0000	     21.7389	  0.2250	   0.5541	   99.98%	  5120	     11.7706	  95.53%	    5.4688
   27	attn_k              	   1152.05	 0.0000	     21.7389	  0.2250	   0.5541	   99.98%	  5120	     11.7706	  95.53%	    5.4688
   27	attn_v              	   1152.05	 0.0000	     21.7389	  0.2250	   0.5541	   99.98%	  5120	     11.7706	  95.53%	    5.4688
   36	attn_q              	   1125.94	 0.0000	     12.8438	  0.2199	   0.3751	   99.98%	  5120	     11.9677	  97.13%	    5.8594
   36	attn_k              	   1125.94	 0.0000	     12.8438	  0.2199	   0.3751	   99.98%	  5120	     11.9677	  97.13%	    5.8594
   36	attn_v              	   1125.94	 0.0000	     12.8438	  0.2199	   0.3751	   99.98%	  5120	     11.9677	  97.13%	    5.8594
   38	attn_k              	   1072.28	 0.0151	     12.4462	  0.2094	   0.3015	  100.00%	  5120	     12.0386	  97.70%	    6.4453
   38	attn_v              	   1072.28	 0.0151	     12.4462	  0.2094	   0.3015	  100.00%	  5120	     12.0386	  97.70%	    6.4453
   38	attn_q              	   1072.28	 0.0151	     12.4462	  0.2094	   0.3015	  100.00%	  5120	     12.0386	  97.70%	    6.4453
   37	attn_v              	   1071.17	 0.0126	     14.2128	  0.2092	   0.3167	  100.00%	  5120	     12.0204	  97.55%	    6.2500
   37	attn_k              	   1071.17	 0.0126	     14.2128	  0.2092	   0.3167	  100.00%	  5120	     12.0204	  97.55%	    6.2500
   37	attn_q              	   1071.17	 0.0126	     14.2128	  0.2092	   0.3167	  100.00%	  5120	     12.0204	  97.55%	    6.2500
   25	attn_v              	   1037.08	 0.0000	     23.9319	  0.2026	   0.6313	   99.98%	  5120	     11.5734	  93.93%	    5.4688
   25	attn_q              	   1037.08	 0.0000	     23.9319	  0.2026	   0.6313	   99.98%	  5120	     11.5734	  93.93%	    5.4688
   25	attn_k              	   1037.08	 0.0000	     23.9319	  0.2026	   0.6313	   99.98%	  5120	     11.5734	  93.93%	    5.4688
   26	attn_k              	   1031.55	 0.0031	     25.6229	  0.2015	   0.6353	  100.00%	  5120	     11.5771	  93.96%	    5.6641
   26	attn_v              	   1031.55	 0.0031	     25.6229	  0.2015	   0.6353	  100.00%	  5120	     11.5771	  93.96%	    5.6641
   26	attn_q              	   1031.55	 0.0031	     25.6229	  0.2015	   0.6353	  100.00%	  5120	     11.5771	  93.96%	    5.6641
   24	attn_k              	    955.35	 0.0000	     20.3266	  0.1866	   0.5947	   99.98%	  5120	     11.5271	  93.55%	    5.8594
   24	attn_q              	    955.35	 0.0000	     20.3266	  0.1866	   0.5947	   99.98%	  5120	     11.5271	  93.55%	    5.8594
   24	attn_v              	    955.35	 0.0000	     20.3266	  0.1866	   0.5947	   99.98%	  5120	     11.5271	  93.55%	    5.8594
   23	attn_k              	    950.08	 0.0000	     22.1702	  0.1856	   0.6765	   99.98%	  5120	     11.3836	  92.39%	    5.4688
   23	attn_v              	    950.08	 0.0000	     22.1702	  0.1856	   0.6765	   99.98%	  5120	     11.3836	  92.39%	    5.4688
   23	attn_q              	    950.08	 0.0000	     22.1702	  0.1856	   0.6765	   99.98%	  5120	     11.3836	  92.39%	    5.4688
   39	attn_q              	    926.54	 0.0431	     16.0860	  0.1810	   0.2805	  100.00%	  5120	     12.0610	  97.88%	    5.8594
   39	attn_k              	    926.54	 0.0431	     16.0860	  0.1810	   0.2805	  100.00%	  5120	     12.0610	  97.88%	    5.8594
   39	attn_v              	    926.54	 0.0431	     16.0860	  0.1810	   0.2805	  100.00%	  5120	     12.0610	  97.88%	    5.8594
   22	attn_v              	    916.79	 0.0000	     18.9033	  0.1791	   0.5414	   99.98%	  5120	     11.5694	  93.89%	    5.8594
   22	attn_q              	    916.79	 0.0000	     18.9033	  0.1791	   0.5414	   99.98%	  5120	     11.5694	  93.89%	    5.8594
   22	attn_k              	    916.79	 0.0000	     18.9033	  0.1791	   0.5414	   99.98%	  5120	     11.5694	  93.89%	    5.8594
   38	ffn_down            	    905.56	 0.0059	     75.8273	  0.0655	   0.7782	  100.00%	 13824	     11.5526	  83.99%	    2.0255
   19	attn_q              	    879.58	 0.0100	     28.6687	  0.1718	   0.8143	  100.00%	  5120	     10.9550	  88.91%	    6.0547
   19	attn_v              	    879.58	 0.0100	     28.6687	  0.1718	   0.8143	  100.00%	  5120	     10.9550	  88.91%	    6.0547
   19	attn_k              	    879.58	 0.0100	     28.6687	  0.1718	   0.8143	  100.00%	  5120	     10.9550	  88.91%	    6.0547
   36	ffn_up              	    870.19	 0.0086	      1.1614	  0.1700	   0.0388	  100.00%	  5120	     12.2979	  99.81%	   38.4766
   36	ffn_gate            	    870.19	 0.0086	      1.1614	  0.1700	   0.0388	  100.00%	  5120	     12.2979	  99.81%	   38.4766
   37	ffn_up              	    866.00	 0.0098	      1.3722	  0.1691	   0.0456	  100.00%	  5120	     12.2901	  99.74%	   40.2344
   37	ffn_gate            	    866.00	 0.0098	      1.3722	  0.1691	   0.0456	  100.00%	  5120	     12.2901	  99.74%	   40.2344
   21	attn_k              	    865.62	 0.0092	     22.5825	  0.1691	   0.7082	  100.00%	  5120	     11.1497	  90.49%	    6.0547
   21	attn_q              	    865.62	 0.0092	     22.5825	  0.1691	   0.7082	  100.00%	  5120	     11.1497	  90.49%	    6.0547
   21	attn_v              	    865.62	 0.0092	     22.5825	  0.1691	   0.7082	  100.00%	  5120	     11.1497	  90.49%	    6.0547
   13	attn_k              	    863.66	 0.0136	     41.3031	  0.1687	   1.1620	  100.00%	  5120	     10.2387	  83.09%	    5.6641
   13	attn_q              	    863.66	 0.0136	     41.3031	  0.1687	   1.1620	  100.00%	  5120	     10.2387	  83.09%	    5.6641
   13	attn_v              	    863.66	 0.0136	     41.3031	  0.1687	   1.1620	  100.00%	  5120	     10.2387	  83.09%	    5.6641
    3	ffn_down            	    863.54	 0.0001	    849.5108	  0.0625	   7.2252	  100.00%	 13824	      0.2206	   1.60%	    0.0723
   16	attn_v              	    860.58	 0.0155	     39.5863	  0.1681	   1.0040	  100.00%	  5120	     10.5837	  85.89%	    6.0547
   16	attn_q              	    860.58	 0.0155	     39.5863	  0.1681	   1.0040	  100.00%	  5120	     10.5837	  85.89%	    6.0547
   16	attn_k              	    860.58	 0.0155	     39.5863	  0.1681	   1.0040	  100.00%	  5120	     10.5837	  85.89%	    6.0547
   14	attn_q              	    859.59	 0.0144	     48.8121	  0.1679	   1.2058	  100.00%	  5120	     10.1958	  82.75%	    5.4688
   14	attn_v              	    859.59	 0.0144	     48.8121	  0.1679	   1.2058	  100.00%	  5120	     10.1958	  82.75%	    5.4688
   14	attn_k              	    859.59	 0.0144	     48.8121	  0.1679	   1.2058	  100.00%	  5120	     10.1958	  82.75%	    5.4688
   18	attn_k              	    843.95	 0.0084	     26.9360	  0.1648	   0.7675	  100.00%	  5120	     10.9957	  89.24%	    6.0547
   18	attn_v              	    843.95	 0.0084	     26.9360	  0.1648	   0.7675	  100.00%	  5120	     10.9957	  89.24%	    6.0547
   18	attn_q              	    843.95	 0.0084	     26.9360	  0.1648	   0.7675	  100.00%	  5120	     10.9957	  89.24%	    6.0547
   17	attn_k              	    842.77	 0.0124	     33.2876	  0.1646	   0.8841	  100.00%	  5120	     10.7489	  87.23%	    5.8594
   17	attn_v              	    842.77	 0.0124	     33.2876	  0.1646	   0.8841	  100.00%	  5120	     10.7489	  87.23%	    5.8594
   17	attn_q              	    842.77	 0.0124	     33.2876	  0.1646	   0.8841	  100.00%	  5120	     10.7489	  87.23%	    5.8594
   38	ffn_up              	    840.16	 0.0088	      2.6975	  0.1641	   0.0626	  100.00%	  5120	     12.2701	  99.58%	   36.9141
   38	ffn_gate            	    840.16	 0.0088	      2.6975	  0.1641	   0.0626	  100.00%	  5120	     12.2701	  99.58%	   36.9141
   35	ffn_up              	    835.32	 0.0068	      1.1382	  0.1631	   0.0333	  100.00%	  5120	     12.3025	  99.84%	   40.2344
   35	ffn_gate            	    835.32	 0.0068	      1.1382	  0.1631	   0.0333	  100.00%	  5120	     12.3025	  99.84%	   40.2344
   15	attn_q              	    820.47	 0.0159	     44.4388	  0.1602	   1.1185	  100.00%	  5120	     10.2600	  83.27%	    5.2734
   15	attn_v              	    820.47	 0.0159	     44.4388	  0.1602	   1.1185	  100.00%	  5120	     10.2600	  83.27%	    5.2734
   15	attn_k              	    820.47	 0.0159	     44.4388	  0.1602	   1.1185	  100.00%	  5120	     10.2600	  83.27%	    5.2734
   20	attn_k              	    810.73	 0.0080	     22.8515	  0.1583	   0.7303	  100.00%	  5120	     10.9871	  89.17%	    6.0547
   20	attn_v              	    810.73	 0.0080	     22.8515	  0.1583	   0.7303	  100.00%	  5120	     10.9871	  89.17%	    6.0547
   20	attn_q              	    810.73	 0.0080	     22.8515	  0.1583	   0.7303	  100.00%	  5120	     10.9871	  89.17%	    6.0547
   34	ffn_up              	    799.17	 0.0067	      1.0181	  0.1561	   0.0281	  100.00%	  5120	     12.3064	  99.87%	   38.2812
   34	ffn_gate            	    799.17	 0.0067	      1.0181	  0.1561	   0.0281	  100.00%	  5120	     12.3064	  99.87%	   38.2812
   12	attn_v              	    782.01	 0.0126	     46.9238	  0.1527	   1.2340	  100.00%	  5120	      9.8808	  80.19%	    5.2734
   12	attn_q              	    782.01	 0.0126	     46.9238	  0.1527	   1.2340	  100.00%	  5120	      9.8808	  80.19%	    5.2734
   12	attn_k              	    782.01	 0.0126	     46.9238	  0.1527	   1.2340	  100.00%	  5120	      9.8808	  80.19%	    5.2734
   33	ffn_up              	    764.58	 0.0056	      0.8259	  0.1493	   0.0239	  100.00%	  5120	     12.3087	  99.89%	   46.4844
   33	ffn_gate            	    764.58	 0.0056	      0.8259	  0.1493	   0.0239	  100.00%	  5120	     12.3087	  99.89%	   46.4844
   32	ffn_gate            	    736.26	 0.0046	      0.7709	  0.1438	   0.0227	  100.00%	  5120	     12.3091	  99.90%	   45.8984
   32	ffn_up              	    736.26	 0.0046	      0.7709	  0.1438	   0.0227	  100.00%	  5120	     12.3091	  99.90%	   45.8984
   10	attn_v              	    713.91	 0.0092	     39.3571	  0.1394	   1.0706	  100.00%	  5120	      9.9807	  81.00%	    5.6641
   10	attn_k              	    713.91	 0.0092	     39.3571	  0.1394	   1.0706	  100.00%	  5120	      9.9807	  81.00%	    5.6641
   10	attn_q              	    713.91	 0.0092	     39.3571	  0.1394	   1.0706	  100.00%	  5120	      9.9807	  81.00%	    5.6641
    9	attn_v              	    709.57	 0.0059	     35.1349	  0.1386	   0.9907	  100.00%	  5120	     10.0564	  81.61%	    6.6406
    9	attn_k              	    709.57	 0.0059	     35.1349	  0.1386	   0.9907	  100.00%	  5120	     10.0564	  81.61%	    6.6406
    9	attn_q              	    709.57	 0.0059	     35.1349	  0.1386	   0.9907	  100.00%	  5120	     10.0564	  81.61%	    6.6406
   31	ffn_gate            	    706.57	 0.0035	      0.5213	  0.1380	   0.0190	  100.00%	  5120	     12.3114	  99.91%	   53.9062
   31	ffn_up              	    706.57	 0.0035	      0.5213	  0.1380	   0.0190	  100.00%	  5120	     12.3114	  99.91%	   53.9062
   11	attn_k              	    695.69	 0.0103	     44.5534	  0.1359	   1.1356	  100.00%	  5120	      9.7664	  79.26%	    5.4688
   11	attn_q              	    695.69	 0.0103	     44.5534	  0.1359	   1.1356	  100.00%	  5120	      9.7664	  79.26%	    5.4688
   11	attn_v              	    695.69	 0.0103	     44.5534	  0.1359	   1.1356	  100.00%	  5120	      9.7664	  79.26%	    5.4688
   30	ffn_gate            	    678.07	 0.0041	      0.5778	  0.1324	   0.0203	  100.00%	  5120	     12.3097	  99.90%	   47.6562
   30	ffn_up              	    678.07	 0.0041	      0.5778	  0.1324	   0.0203	  100.00%	  5120	     12.3097	  99.90%	   47.6562
   39	ffn_gate            	    648.54	 0.0191	      5.6152	  0.1267	   0.0890	  100.00%	  5120	     12.2396	  99.33%	   12.3047
   39	ffn_up              	    648.54	 0.0191	      5.6152	  0.1267	   0.0890	  100.00%	  5120	     12.2396	  99.33%	   12.3047
   29	ffn_up              	    647.83	 0.0048	      0.4959	  0.1265	   0.0169	  100.00%	  5120	     12.3115	  99.92%	   62.6953
   29	ffn_gate            	    647.83	 0.0048	      0.4959	  0.1265	   0.0169	  100.00%	  5120	     12.3115	  99.92%	   62.6953
   28	ffn_up              	    621.34	 0.0073	      0.4593	  0.1214	   0.0171	  100.00%	  5120	     12.3108	  99.91%	   59.5703
   28	ffn_gate            	    621.34	 0.0073	      0.4593	  0.1214	   0.0171	  100.00%	  5120	     12.3108	  99.91%	   59.5703
   27	ffn_gate            	    596.51	 0.0036	      0.5035	  0.1165	   0.0176	  100.00%	  5120	     12.3092	  99.90%	   63.4766
   27	ffn_up              	    596.51	 0.0036	      0.5035	  0.1165	   0.0176	  100.00%	  5120	     12.3092	  99.90%	   63.4766
    8	attn_q              	    595.64	 0.0067	     34.9034	  0.1163	   0.8977	  100.00%	  5120	      9.9023	  80.36%	    5.8594
    8	attn_v              	    595.64	 0.0067	     34.9034	  0.1163	   0.8977	  100.00%	  5120	      9.9023	  80.36%	    5.8594
    8	attn_k              	    595.64	 0.0067	     34.9034	  0.1163	   0.8977	  100.00%	  5120	      9.9023	  80.36%	    5.8594
   37	ffn_down            	    592.02	 0.0074	     16.6926	  0.0428	   0.1790	  100.00%	 13824	     12.6990	  92.32%	   25.3906
   26	ffn_gate            	    568.09	 0.0044	      0.5478	  0.1110	   0.0182	  100.00%	  5120	     12.3079	  99.89%	   53.3203
   26	ffn_up              	    568.09	 0.0044	      0.5478	  0.1110	   0.0182	  100.00%	  5120	     12.3079	  99.89%	   53.3203
   25	ffn_gate            	    542.26	 0.0052	      0.5749	  0.1059	   0.0192	  100.00%	  5120	     12.3055	  99.87%	   47.0703
   25	ffn_up              	    542.26	 0.0052	      0.5749	  0.1059	   0.0192	  100.00%	  5120	     12.3055	  99.87%	   47.0703
    7	attn_k              	    536.38	 0.0000	     37.2838	  0.1048	   0.9200	   99.98%	  5120	      9.3955	  76.25%	    6.6406
    7	attn_q              	    536.38	 0.0000	     37.2838	  0.1048	   0.9200	   99.98%	  5120	      9.3955	  76.25%	    6.6406
    7	attn_v              	    536.38	 0.0000	     37.2838	  0.1048	   0.9200	   99.98%	  5120	      9.3955	  76.25%	    6.6406
   24	ffn_gate            	    513.76	 0.0061	      0.6509	  0.1003	   0.0216	  100.00%	  5120	     12.3012	  99.83%	   37.5000
   24	ffn_up              	    513.76	 0.0061	      0.6509	  0.1003	   0.0216	  100.00%	  5120	     12.3012	  99.83%	   37.5000
    6	attn_k              	    511.80	 0.0000	     34.5247	  0.1000	   0.7756	   99.98%	  5120	      9.8035	  79.56%	    7.4219
    6	attn_v              	    511.80	 0.0000	     34.5247	  0.1000	   0.7756	   99.98%	  5120	      9.8035	  79.56%	    7.4219
    6	attn_q              	    511.80	 0.0000	     34.5247	  0.1000	   0.7756	   99.98%	  5120	      9.8035	  79.56%	    7.4219
   36	ffn_down            	    493.83	 0.0075	      5.3032	  0.0357	   0.0743	  100.00%	 13824	     13.0480	  94.86%	   44.4879
   23	ffn_gate            	    488.15	 0.0045	      0.7809	  0.0953	   0.0255	  100.00%	  5120	     12.2943	  99.78%	   17.9688
   23	ffn_up              	    488.15	 0.0045	      0.7809	  0.0953	   0.0255	  100.00%	  5120	     12.2943	  99.78%	   17.9688
   22	ffn_up              	    461.78	 0.0070	      0.8592	  0.0902	   0.0298	  100.00%	  5120	     12.2841	  99.69%	   12.8906
   22	ffn_gate            	    461.78	 0.0070	      0.8592	  0.0902	   0.0298	  100.00%	  5120	     12.2841	  99.69%	   12.8906
    5	attn_k              	    461.03	 0.0000	     27.0042	  0.0900	   0.7100	   99.96%	  5120	      9.4849	  76.98%	    8.9844
    5	attn_v              	    461.03	 0.0000	     27.0042	  0.0900	   0.7100	   99.96%	  5120	      9.4849	  76.98%	    8.9844
    5	attn_q              	    461.03	 0.0000	     27.0042	  0.0900	   0.7100	   99.96%	  5120	      9.4849	  76.98%	    8.9844
   21	ffn_up              	    432.89	 0.0068	      1.0011	  0.0845	   0.0359	  100.00%	  5120	     12.2675	  99.56%	   10.5469
   21	ffn_gate            	    432.89	 0.0068	      1.0011	  0.0845	   0.0359	  100.00%	  5120	     12.2675	  99.56%	   10.5469
    4	attn_k              	    416.60	 0.0000	     25.1496	  0.0814	   0.6785	   99.96%	  5120	      9.2580	  75.13%	    9.9609
    4	attn_v              	    416.60	 0.0000	     25.1496	  0.0814	   0.6785	   99.96%	  5120	      9.2580	  75.13%	    9.9609
    4	attn_q              	    416.60	 0.0000	     25.1496	  0.0814	   0.6785	   99.96%	  5120	      9.2580	  75.13%	    9.9609
   35	ffn_down            	    411.85	 0.0053	      7.9751	  0.0298	   0.0819	  100.00%	 13824	     13.0757	  95.06%	   28.2841
   20	ffn_gate            	    403.55	 0.0171	      1.2925	  0.0788	   0.0435	  100.00%	  5120	     12.2438	  99.37%	    8.7891
   20	ffn_up              	    403.55	 0.0171	      1.2925	  0.0788	   0.0435	  100.00%	  5120	     12.2438	  99.37%	    8.7891
   19	ffn_gate            	    382.99	 0.0103	      1.2834	  0.0748	   0.0409	  100.00%	  5120	     12.2452	  99.38%	    8.9844
   19	ffn_up              	    382.99	 0.0103	      1.2834	  0.0748	   0.0409	  100.00%	  5120	     12.2452	  99.38%	    8.9844
   18	ffn_gate            	    360.11	 0.0086	      1.1621	  0.0703	   0.0419	  100.00%	  5120	     12.2340	  99.29%	    9.1797
   18	ffn_up              	    360.11	 0.0086	      1.1621	  0.0703	   0.0419	  100.00%	  5120	     12.2340	  99.29%	    9.1797
   34	ffn_down            	    343.68	 0.0057	      1.9176	  0.0249	   0.0342	  100.00%	 13824	     13.3093	  96.76%	   43.4028
   17	ffn_up              	    336.38	 0.0122	      1.4292	  0.0657	   0.0480	  100.00%	  5120	     12.2045	  99.05%	    8.5938
   17	ffn_gate            	    336.38	 0.0122	      1.4292	  0.0657	   0.0480	  100.00%	  5120	     12.2045	  99.05%	    8.5938
   16	ffn_gate            	    311.79	 0.0122	      1.7776	  0.0609	   0.0573	  100.00%	  5120	     12.1552	  98.65%	    8.3984
   16	ffn_up              	    311.79	 0.0122	      1.7776	  0.0609	   0.0573	  100.00%	  5120	     12.1552	  98.65%	    8.3984
   33	ffn_down            	    307.16	 0.0097	      7.3743	  0.0222	   0.0698	  100.00%	 13824	     13.2318	  96.20%	   14.9740
   15	ffn_up              	    288.24	 0.0109	      2.0467	  0.0563	   0.0615	  100.00%	  5120	     12.1205	  98.37%	    8.0078
   15	ffn_gate            	    288.24	 0.0109	      2.0467	  0.0563	   0.0615	  100.00%	  5120	     12.1205	  98.37%	    8.0078
   14	ffn_up              	    272.26	 0.0103	      2.6254	  0.0532	   0.0710	  100.00%	  5120	     12.0645	  97.91%	    7.8125
   14	ffn_gate            	    272.26	 0.0103	      2.6254	  0.0532	   0.0710	  100.00%	  5120	     12.0645	  97.91%	    7.8125
   32	ffn_down            	    270.24	 0.0095	      0.7403	  0.0195	   0.0193	  100.00%	 13824	     13.4759	  97.97%	   46.8027
   13	ffn_up              	    254.86	 0.0113	      2.6888	  0.0498	   0.0725	  100.00%	  5120	     12.0363	  97.68%	    7.2266
   13	ffn_gate            	    254.86	 0.0113	      2.6888	  0.0498	   0.0725	  100.00%	  5120	     12.0363	  97.68%	    7.2266
   31	ffn_down            	    250.66	 0.0086	      0.9231	  0.0181	   0.0188	  100.00%	 13824	     13.4937	  98.10%	   43.7645
   12	ffn_gate            	    239.95	 0.0166	      2.6666	  0.0469	   0.0752	  100.00%	  5120	     11.9867	  97.28%	    7.2266
   12	ffn_up              	    239.95	 0.0166	      2.6666	  0.0469	   0.0752	  100.00%	  5120	     11.9867	  97.28%	    7.2266
   30	ffn_down            	    237.44	 0.0079	      0.5803	  0.0172	   0.0149	  100.00%	 13824	     13.5080	  98.20%	   50.7812
   11	ffn_up              	    230.23	 0.0148	      2.8725	  0.0450	   0.0777	  100.00%	  5120	     11.9567	  97.04%	    7.0312
   11	ffn_gate            	    230.23	 0.0148	      2.8725	  0.0450	   0.0777	  100.00%	  5120	     11.9567	  97.04%	    7.0312
   29	ffn_down            	    227.64	 0.0074	      6.8119	  0.0165	   0.0593	  100.00%	 13824	     13.3079	  96.75%	    7.5231
   10	ffn_up              	    220.84	 0.0059	      2.3218	  0.0431	   0.0624	  100.00%	  5120	     12.0437	  97.74%	    7.4219
   10	ffn_gate            	    220.84	 0.0059	      2.3218	  0.0431	   0.0624	  100.00%	  5120	     12.0437	  97.74%	    7.4219
   39	attn_output         	    213.80	 0.0049	      1.7995	  0.0418	   0.0570	  100.00%	  5120	     11.6992	  94.95%	   90.6250
    3	attn_k              	    212.66	 0.0000	     17.1690	  0.0415	   0.4298	   99.98%	  5120	      8.5517	  69.40%	    7.0312
    3	attn_q              	    212.66	 0.0000	     17.1690	  0.0415	   0.4298	   99.98%	  5120	      8.5517	  69.40%	    7.0312
    3	attn_v              	    212.66	 0.0000	     17.1690	  0.0415	   0.4298	   99.98%	  5120	      8.5517	  69.40%	    7.0312
    9	ffn_gate            	    211.89	 0.0064	      1.9591	  0.0414	   0.0548	  100.00%	  5120	     12.0596	  97.87%	    7.6172
    9	ffn_up              	    211.89	 0.0064	      1.9591	  0.0414	   0.0548	  100.00%	  5120	     12.0596	  97.87%	    7.6172
    2	attn_v              	    211.81	 0.0000	     13.5470	  0.0414	   0.5105	   99.86%	  5120	      7.5117	  60.96%	    5.0781
    2	attn_q              	    211.81	 0.0000	     13.5470	  0.0414	   0.5105	   99.86%	  5120	      7.5117	  60.96%	    5.0781
    2	attn_k              	    211.81	 0.0000	     13.5470	  0.0414	   0.5105	   99.86%	  5120	      7.5117	  60.96%	    5.0781
   28	ffn_down            	    210.59	 0.0071	      0.7934	  0.0152	   0.0169	  100.00%	 13824	     13.4661	  97.90%	   42.6794
   27	ffn_down            	    204.54	 0.0061	      8.1876	  0.0148	   0.0705	  100.00%	 13824	     13.2151	  96.08%	    4.0509
   26	ffn_down            	    195.28	 0.0058	      3.9368	  0.0141	   0.0383	  100.00%	 13824	     13.2929	  96.64%	   14.0336
    8	ffn_gate            	    189.36	 0.0115	      1.6949	  0.0370	   0.0461	  100.00%	  5120	     12.0880	  98.10%	    7.8125
    8	ffn_up              	    189.36	 0.0115	      1.6949	  0.0370	   0.0461	  100.00%	  5120	     12.0880	  98.10%	    7.8125
   38	attn_output         	    185.57	 0.0016	      1.4583	  0.0362	   0.0547	  100.00%	  5120	     11.5948	  94.10%	   53.1250
   25	ffn_down            	    177.29	 0.0051	      0.8608	  0.0128	   0.0142	  100.00%	 13824	     13.4412	  97.72%	   47.8877
   24	ffn_down            	    167.83	 0.0045	      0.8385	  0.0121	   0.0184	  100.00%	 13824	     13.3351	  96.95%	   32.1904
    7	ffn_up              	    167.13	 0.0085	      1.2138	  0.0326	   0.0395	  100.00%	  5120	     12.0921	  98.13%	    6.8359
    7	ffn_gate            	    167.13	 0.0085	      1.2138	  0.0326	   0.0395	  100.00%	  5120	     12.0921	  98.13%	    6.8359
   23	ffn_down            	    161.22	 0.0045	      1.2035	  0.0117	   0.0192	  100.00%	 13824	     13.3102	  96.77%	   31.1777
   22	ffn_down            	    150.90	 0.0038	      0.8320	  0.0109	   0.0151	  100.00%	 13824	     13.3489	  97.05%	   39.8582
    1	attn_k              	    148.63	 0.0000	     22.4289	  0.0290	   0.5286	   99.80%	  5120	      5.8192	  47.23%	    3.7109
    1	attn_q              	    148.63	 0.0000	     22.4289	  0.0290	   0.5286	   99.80%	  5120	      5.8192	  47.23%	    3.7109
    1	attn_v              	    148.63	 0.0000	     22.4289	  0.0290	   0.5286	   99.80%	  5120	      5.8192	  47.23%	    3.7109
   21	ffn_down            	    147.96	 0.0036	      1.6641	  0.0107	   0.0245	  100.00%	 13824	     13.1859	  95.86%	   19.8206
    6	ffn_up              	    143.83	 0.0134	      0.7677	  0.0281	   0.0279	  100.00%	  5120	     12.1471	  98.58%	    7.4219
    6	ffn_gate            	    143.83	 0.0134	      0.7677	  0.0281	   0.0279	  100.00%	  5120	     12.1471	  98.58%	    7.4219
   37	attn_output         	    127.32	 0.0007	      1.2476	  0.0249	   0.0382	  100.00%	  5120	     11.6690	  94.70%	   36.5234
   36	attn_output         	    124.95	 0.0022	      0.7087	  0.0244	   0.0317	  100.00%	  5120	     11.7572	  95.42%	   64.4531
   20	ffn_down            	    119.81	 0.0030	      0.3580	  0.0087	   0.0095	  100.00%	 13824	     13.4021	  97.44%	   53.0237
    5	ffn_gate            	    114.26	 0.0015	      0.5836	  0.0223	   0.0180	  100.00%	  5120	     12.1927	  98.95%	    8.2031
    5	ffn_up              	    114.26	 0.0015	      0.5836	  0.0223	   0.0180	  100.00%	  5120	     12.1927	  98.95%	    8.2031
   19	ffn_down            	    110.82	 0.0026	      0.5981	  0.0080	   0.0117	  100.00%	 13824	     13.3221	  96.85%	   37.1817
   18	ffn_down            	    100.26	 0.0026	      1.6162	  0.0073	   0.0172	  100.00%	 13824	     13.2686	  96.46%	   18.5185
   17	ffn_down            	     91.33	 0.0017	      0.9219	  0.0066	   0.0102	  100.00%	 13824	     13.3992	  97.41%	   30.8883
    4	ffn_gate            	     87.21	 0.0002	      0.2963	  0.0170	   0.0101	  100.00%	  5120	     12.2345	  99.29%	   10.5469
    4	ffn_up              	     87.21	 0.0002	      0.2963	  0.0170	   0.0101	  100.00%	  5120	     12.2345	  99.29%	   10.5469
   16	ffn_down            	     83.68	 0.0018	      0.3795	  0.0061	   0.0068	  100.00%	 13824	     13.4214	  97.58%	   46.2240
   35	attn_output         	     80.93	 0.0009	      0.3628	  0.0158	   0.0178	  100.00%	  5120	     11.8167	  95.90%	   67.3828
   15	ffn_down            	     69.29	 0.0015	      0.4523	  0.0050	   0.0060	  100.00%	 13824	     13.4392	  97.70%	   43.4028
   34	attn_output         	     68.75	 0.0018	      0.3458	  0.0134	   0.0159	  100.00%	  5120	     11.7593	  95.43%	   90.4297
    3	ffn_gate            	     63.74	 0.0000	      0.9831	  0.0124	   0.0160	  100.00%	  5120	     12.1360	  98.49%	    7.8125
    3	ffn_up              	     63.74	 0.0000	      0.9831	  0.0124	   0.0160	  100.00%	  5120	     12.1360	  98.49%	    7.8125
   21	attn_output         	     63.53	 0.0021	      0.5559	  0.0124	   0.0145	  100.00%	  5120	     11.8760	  96.38%	   53.7109
   15	attn_output         	     63.25	 0.0013	      0.1506	  0.0124	   0.0118	  100.00%	  5120	     11.9061	  96.62%	   81.6406
   14	ffn_down            	     60.91	 0.0014	      0.3164	  0.0044	   0.0045	  100.00%	 13824	     13.4907	  98.08%	   48.8281
   32	attn_output         	     60.46	 0.0005	      0.4920	  0.0118	   0.0169	  100.00%	  5120	     11.7173	  95.09%	   67.5781
   14	attn_output         	     59.20	 0.0033	      0.2145	  0.0116	   0.0095	  100.00%	  5120	     12.0477	  97.77%	   57.4219
   31	attn_output         	     58.85	 0.0005	      0.4893	  0.0115	   0.0167	  100.00%	  5120	     11.6401	  94.47%	   50.1953
   16	attn_output         	     58.58	 0.0012	      0.1902	  0.0114	   0.0095	  100.00%	  5120	     12.0063	  97.44%	   88.8672
   17	attn_output         	     58.46	 0.0005	      0.2506	  0.0114	   0.0106	  100.00%	  5120	     11.9494	  96.98%	   61.5234
   33	attn_output         	     53.96	 0.0014	      0.2382	  0.0105	   0.0079	  100.00%	  5120	     12.0467	  97.77%	  108.9844
   24	attn_output         	     53.59	 0.0005	      0.5380	  0.0105	   0.0263	  100.00%	  5120	     11.1589	  90.56%	   33.2031
   13	ffn_down            	     53.16	 0.0012	      0.1572	  0.0038	   0.0035	  100.00%	 13824	     13.5008	  98.15%	   50.1302
   20	attn_output         	     52.53	 0.0015	      0.2461	  0.0103	   0.0114	  100.00%	  5120	     11.8431	  96.11%	   75.1953
   30	attn_output         	     50.85	 0.0007	      0.2020	  0.0099	   0.0085	  100.00%	  5120	     11.9906	  97.31%	   95.5078
   12	ffn_down            	     46.43	 0.0004	      0.0648	  0.0034	   0.0025	  100.00%	 13824	     13.5358	  98.41%	   70.0231
   11	ffn_down            	     44.24	 0.0008	      0.4759	  0.0032	   0.0049	  100.00%	 13824	     13.4624	  97.87%	   23.6545
   13	attn_output         	     43.56	 0.0003	      0.1377	  0.0085	   0.0073	  100.00%	  5120	     11.9801	  97.23%	   63.0859
   12	attn_output         	     43.40	 0.0009	      0.1860	  0.0085	   0.0078	  100.00%	  5120	     11.9642	  97.10%	   72.8516
   11	attn_output         	     42.74	 0.0006	      0.5558	  0.0083	   0.0176	  100.00%	  5120	     11.4660	  93.05%	   50.1953
   25	attn_output         	     42.61	 0.0006	      0.3259	  0.0083	   0.0095	  100.00%	  5120	     11.8723	  96.35%	   69.9219
   23	attn_output         	     42.58	 0.0005	      0.1831	  0.0083	   0.0095	  100.00%	  5120	     11.7843	  95.64%	   62.6953
   19	attn_output         	     42.16	 0.0004	      0.2335	  0.0082	   0.0076	  100.00%	  5120	     12.0083	  97.45%	   41.7969
   26	attn_output         	     41.73	 0.0003	      0.2064	  0.0082	   0.0076	  100.00%	  5120	     11.9276	  96.80%	   79.4922
   27	attn_output         	     41.03	 0.0003	      0.8884	  0.0080	   0.0141	  100.00%	  5120	     11.8718	  96.35%	   25.7812
   22	attn_output         	     40.76	 0.0003	      0.1580	  0.0080	   0.0071	  100.00%	  5120	     11.8881	  96.48%	   99.6094
   18	attn_output         	     40.68	 0.0014	      0.2471	  0.0079	   0.0069	  100.00%	  5120	     12.0482	  97.78%	   57.2266
   10	ffn_down            	     39.95	 0.0006	      0.1846	  0.0029	   0.0025	  100.00%	 13824	     13.5468	  98.49%	   48.9728
    2	ffn_up              	     38.98	 0.0000	      0.1812	  0.0076	   0.0036	  100.00%	  5120	     12.2648	  99.54%	    7.4219
    2	ffn_gate            	     38.98	 0.0000	      0.1812	  0.0076	   0.0036	  100.00%	  5120	     12.2648	  99.54%	    7.4219
   29	attn_output         	     38.72	 0.0016	      0.0977	  0.0076	   0.0053	  100.00%	  5120	     12.0489	  97.78%	  130.2734
   28	attn_output         	     38.28	 0.0006	      0.1802	  0.0075	   0.0064	  100.00%	  5120	     11.9516	  96.99%	  131.0547
   10	attn_output         	     36.31	 0.0004	      0.1589	  0.0071	   0.0085	  100.00%	  5120	     11.7977	  95.75%	   60.7422
    9	ffn_down            	     36.00	 0.0006	      0.7241	  0.0026	   0.0067	  100.00%	 13824	     13.3678	  97.19%	   10.7784
    8	ffn_down            	     30.51	 0.0004	      0.3576	  0.0022	   0.0042	  100.00%	 13824	     13.3650	  97.17%	   20.4716
    9	attn_output         	     25.89	 0.0003	      0.1683	  0.0051	   0.0074	  100.00%	  5120	     11.6535	  94.58%	   51.5625
    7	ffn_down            	     25.57	 0.0002	      0.3904	  0.0018	   0.0055	  100.00%	 13824	     13.1784	  95.81%	    9.4763
    6	ffn_down            	     18.29	 0.0003	      0.1456	  0.0013	   0.0018	  100.00%	 13824	     13.4276	  97.62%	   35.3733
    0	attn_q              	     18.29	 0.0000	      5.9196	  0.0036	   0.0950	   94.32%	  5120	      4.4566	  36.17%	    4.8828
    0	attn_k              	     18.29	 0.0000	      5.9196	  0.0036	   0.0950	   94.32%	  5120	      4.4566	  36.17%	    4.8828
    0	attn_v              	     18.29	 0.0000	      5.9196	  0.0036	   0.0950	   94.32%	  5120	      4.4566	  36.17%	    4.8828
    8	attn_output         	     17.56	 0.0001	      0.0978	  0.0034	   0.0039	  100.00%	  5120	     11.8420	  96.10%	   55.8594
    1	ffn_gate            	     17.11	 0.0000	      0.5277	  0.0033	   0.0083	  100.00%	  5120	     11.9241	  96.77%	    5.0781
    1	ffn_up              	     17.11	 0.0000	      0.5277	  0.0033	   0.0083	  100.00%	  5120	     11.9241	  96.77%	    5.0781
    7	attn_output         	     13.82	 0.0001	      0.0629	  0.0027	   0.0034	  100.00%	  5120	     11.7857	  95.65%	   51.5625
    5	ffn_down            	     12.69	 0.0001	      0.3858	  0.0009	   0.0034	  100.00%	 13824	     13.2589	  96.39%	    7.2338
    6	attn_output         	      9.60	 0.0000	      0.0566	  0.0019	   0.0026	  100.00%	  5120	     11.6751	  94.75%	   54.8828
    4	ffn_down            	      7.48	 0.0001	      0.0299	  0.0005	   0.0006	  100.00%	 13824	     13.4405	  97.71%	   54.4705
    0	ffn_gate            	      7.24	 0.0000	      0.3432	  0.0014	   0.0109	   99.94%	  5120	      9.7065	  78.77%	    6.4453
    0	ffn_up              	      7.24	 0.0000	      0.3432	  0.0014	   0.0109	   99.94%	  5120	      9.7065	  78.77%	    6.4453
    5	attn_output         	      6.31	 0.0000	      0.0573	  0.0012	   0.0018	  100.00%	  5120	     11.7298	  95.19%	   33.3984
    4	attn_output         	      4.28	 0.0000	      0.0411	  0.0008	   0.0016	  100.00%	  5120	     11.5801	  93.98%	   32.4219
    0	ffn_down            	      4.25	 0.0000	      3.6589	  0.0003	   0.0312	   99.73%	 13824	      1.6508	  12.00%	    0.1447
    3	attn_output         	      3.57	 0.0000	      0.0637	  0.0007	   0.0025	  100.00%	  5120	     10.5307	  85.46%	   26.9531
    2	ffn_down            	      2.67	 0.0000	      0.0087	  0.0002	   0.0002	  100.00%	 13824	     13.3953	  97.39%	   44.5602
    1	ffn_down            	      2.13	 0.0000	      0.6453	  0.0002	   0.0061	  100.00%	 13824	      8.4307	  61.29%	    0.3617
    2	attn_output         	      1.46	 0.0000	      0.0200	  0.0003	   0.0005	  100.00%	  5120	     11.4702	  93.09%	   42.7734
    1	attn_output         	      1.05	 0.0000	      0.0229	  0.0002	   0.0006	  100.00%	  5120	     10.2723	  83.37%	   50.5859
    0	attn_output         	      0.46	 0.0000	      0.0577	  0.0001	   0.0011	   90.25%	  5120	      7.1328	  57.89%	   12.8906

Graph of Entropy & ZD Score by Layer and Tensor

imatrix-stats

Discussion

So I'm not sure how best to read these stats and interpret the graphs. According to the Layer-wise Quantization Paper the top 3 most important layers according to their LIM Score are 1, 2, and 40. The least important being 32, 33, and 34. However, I don't see a correlation in the graphs at least with Entropy and what you are calling "ZD Score"*

*Just to confirm, what you are calling "ZD Score" is calculated using the imatrix activations whereas in the paper it is defined as all weights in a given layer, (my emphasis):

We examine the proportion of weights in a layer exhibiting a z-score greater than 1. where for layer Li, wi represents an individual weight, µ the mean of the weights, and σ their standard deviation.

Anyway, just some observations. I didn't slice the data to look at the other metrics nor try to normalize all the tensors of a given layer togther into a single "layer" score.

Fascinating stuff, hopefully I can dig in more later this week! Cheers!

Apr 14 '25 02:04 ubergarm

Fascinating stuff indeed @ubergarm, and apparently not without controversy 🙃

In a room full of PhDs, I'd be Howard Wolowitz 🤣 so, dear reader, please take everything that follows with the proverbial pinch of salt, and do not pull back from pointing out errors or gaps in my logic.

The notion of determining the importance of a specific tensor in a specific layer by somehow measuring the degree of transformation of the hidden states (be it with importance scores, cosine similarity, etc.) as the tokens "flow" from that layer to the next seems -intuitively- reasonable to me and, as few have correctly pointed out, having access to the weights during those transformations will yield significantly better measurements.

In my case however, and for the reasons explained above, I'm left with the next best option, which is the sum of the squared activations (imatrix importance scores) for specific tensors in specific layers. That's what I'm calling Σ(Bias), in reference to total "power" in a vector of discrete signals (sum or the squared elements in the vector). The intuition is that the more bias there is, the busier the tensor. That's as far as I dare to take the EE analogy 😉.

I'm emphasising specific tensor & specific layer to signify that the stats should be used to compare between tensors of the same type only. In other words, thinking that attn_k in layer X has more influence during inference than attn_k in layer Y because its Σ(Bias) is larger makes sense, whilst concluding the same between attn_k and ffn_down does not. I've just pushed a change in how the stats are displayed to better convey this.

To validate the hypothesis we of course need lots of tests, but so far, and based solely on layer-wise quantizing DeepSeek-R1-Distill-Qwen-7B, it seems to hold (approach and results in my previous comment 👆 and corresponding imatrix stats at the end 👇 ). Testing other models is needed, but so far so good.

I have indeed taken the paper's ZD concept and applied it to the activations. Their Z-score Distribution (a better name would be z-score density, IMO) is nothing more than the percentage of elements that have a z-score greater than 1 standard deviation from the mean.

I haven't had a chance to really grok the relevance of this metric, but suspect that in combination with the normalized entropy it may give insights into whole layer scoring, but that's a (pruning) story for another day...

Computing statistics for imatrix-DeepSeek-R1-Distill-Qwen-7B-small.dat (197 tensors)

Layer	Tensor              	  Σ(Bias)	    Min	         Max	       μ	        σ	 % Active	     N	     Entropy	E (norm)	  ZD Score
==========================================================================================================================================================================
   27	attn_k              	   5141.31	 0.0578	    405.6018	  1.4345	   8.5063	  100.00%	  3584	      8.2161	  69.58%	     5.05%
   26	attn_k              	   3514.78	 0.0014	    336.0238	  0.9807	   6.3577	  100.00%	  3584	      8.6701	  73.43%	     4.77%
   23	attn_k              	   2577.34	 0.0711	    107.3467	  0.7191	   2.8482	  100.00%	  3584	      9.2976	  78.74%	     5.36%
   25	attn_k              	   2416.49	 0.0523	    192.7465	  0.6742	   3.6958	  100.00%	  3584	      9.4202	  79.78%	     4.85%
   24	attn_k              	   2345.51	 0.0433	    235.1290	  0.6544	   4.3505	  100.00%	  3584	      9.3335	  79.05%	     2.68%
   22	attn_k              	   2341.42	 0.0616	    106.0560	  0.6533	   2.9773	  100.00%	  3584	      9.3443	  79.14%	     2.87%
   21	attn_k              	   1465.48	 0.0488	     65.1086	  0.4089	   1.8415	  100.00%	  3584	      9.7659	  82.71%	     1.95%
   19	attn_k              	   1354.92	 0.0160	     64.9419	  0.3780	   2.0088	  100.00%	  3584	      9.4633	  80.15%	     1.79%
   20	attn_k              	   1271.46	 0.0245	     58.6785	  0.3548	   1.7495	  100.00%	  3584	      9.6939	  82.10%	     1.84%
   16	attn_k              	   1217.92	 0.0000	     68.7396	  0.3398	   1.8574	  100.00%	  3584	      9.2844	  78.63%	     1.81%
   17	attn_k              	   1193.92	 0.0139	     50.0219	  0.3331	   1.5332	  100.00%	  3584	      9.6450	  81.69%	     1.90%
   14	attn_k              	   1188.44	 0.0079	     48.7036	  0.3316	   1.4011	  100.00%	  3584	      9.6869	  82.04%	     2.37%
   18	attn_k              	   1001.68	 0.0072	     54.0705	  0.2795	   1.4768	  100.00%	  3584	      9.6582	  81.80%	     1.48%
   15	attn_k              	    923.17	 0.0020	     32.2622	  0.2576	   1.1821	  100.00%	  3584	      9.4031	  79.64%	     2.46%
    8	attn_k              	    784.03	 0.0082	     12.9517	  0.2188	   0.6849	  100.00%	  3584	     10.1589	  86.04%	     2.85%
   13	attn_k              	    752.92	 0.0000	     25.2086	  0.2101	   0.7649	   99.97%	  3584	     10.2496	  86.81%	     1.87%
   12	attn_k              	    738.25	 0.0061	     24.0529	  0.2060	   0.7757	  100.00%	  3584	     10.1182	  85.69%	     1.90%
    9	attn_k              	    733.39	 0.0000	     16.4946	  0.2046	   0.6262	  100.00%	  3584	     10.5356	  89.23%	     2.20%
    4	attn_k              	    689.25	 0.0000	     26.4802	  0.1923	   1.1755	   98.80%	  3584	      8.4224	  71.33%	     1.76%
    5	attn_k              	    687.23	 0.0000	     31.9846	  0.1917	   0.7180	   99.89%	  3584	     10.1248	  85.75%	     2.54%
   11	attn_k              	    685.48	 0.0080	     17.6951	  0.1913	   0.7004	  100.00%	  3584	     10.0526	  85.14%	     2.20%
   10	attn_k              	    630.31	 0.0076	     16.3245	  0.1759	   0.6634	  100.00%	  3584	     10.1971	  86.36%	     2.01%
    7	attn_k              	    615.92	 0.0000	     12.5285	  0.1719	   0.5429	  100.00%	  3584	     10.4200	  88.25%	     1.87%
    6	attn_k              	    499.66	 0.0000	     16.2125	  0.1394	   0.6909	   99.89%	  3584	      9.6434	  81.67%	     1.31%
    3	attn_k              	    308.74	 0.0000	     11.9797	  0.0861	   0.3259	   98.07%	  3584	      9.5947	  81.26%	     4.94%
    2	attn_k              	    258.92	 0.0000	      7.6345	  0.0722	   0.2554	   94.81%	  3584	      9.8862	  83.73%	     3.26%
    0	attn_k              	    120.98	 0.0000	     11.3855	  0.0338	   0.1961	   99.97%	  3584	     10.8332	  91.75%	     0.39%
    1	attn_k              	     68.39	 0.0000	      7.4842	  0.0191	   0.1749	   86.05%	  3584	      7.8550	  66.53%	     1.34%
   27	attn_output         	   5664.79	 0.1570	     47.1631	  1.5806	   2.8290	  100.00%	  3584	     10.9222	  92.50%	     5.97%
   26	attn_output         	   1455.48	 0.0136	     36.9886	  0.4061	   1.5633	  100.00%	  3584	     10.6218	  89.96%	     0.67%
   23	attn_output         	   1162.73	 0.0292	     28.5696	  0.3244	   1.2175	  100.00%	  3584	     10.4851	  88.80%	     0.78%
   25	attn_output         	   1087.16	 0.0556	     39.0104	  0.3033	   1.6812	  100.00%	  3584	     10.1333	  85.82%	     0.25%
   24	attn_output         	    802.42	 0.0178	     12.8809	  0.2239	   0.5729	  100.00%	  3584	     10.9313	  92.58%	     1.53%
   21	attn_output         	    583.25	 0.0091	      3.4697	  0.1627	   0.2657	  100.00%	  3584	     10.8242	  91.67%	     7.00%
   19	attn_output         	    574.93	 0.0103	      4.3428	  0.1604	   0.3092	  100.00%	  3584	     10.6549	  90.24%	     7.37%
   18	attn_output         	    498.09	 0.0091	      5.5657	  0.1390	   0.2735	  100.00%	  3584	     10.7222	  90.81%	     7.34%
   22	attn_output         	    394.58	 0.0023	      3.4242	  0.1101	   0.1788	  100.00%	  3584	     11.0570	  93.65%	     4.05%
   20	attn_output         	    387.68	 0.0086	      6.0710	  0.1082	   0.2653	  100.00%	  3584	     10.8025	  91.49%	     2.59%
   16	attn_output         	    313.86	 0.0044	      4.4249	  0.0876	   0.1933	  100.00%	  3584	     10.7883	  91.37%	     3.93%
   15	attn_output         	    297.66	 0.0015	      2.4456	  0.0831	   0.1524	  100.00%	  3584	     10.8274	  91.70%	     5.41%
   13	attn_output         	    272.14	 0.0090	      4.0031	  0.0759	   0.1406	  100.00%	  3584	     10.8771	  92.12%	     6.70%
   17	attn_output         	    267.64	 0.0045	      5.3183	  0.0747	   0.2063	  100.00%	  3584	     10.5521	  89.37%	     2.93%
   14	attn_output         	    259.32	 0.0005	     12.2898	  0.0724	   0.2893	  100.00%	  3584	     10.1023	  85.56%	     2.73%
   12	attn_output         	    201.57	 0.0050	      3.6905	  0.0562	   0.1336	  100.00%	  3584	     10.6677	  90.35%	     5.22%
   11	attn_output         	    184.43	 0.0049	      2.6849	  0.0515	   0.0968	  100.00%	  3584	     11.0717	  93.77%	     3.71%
    7	attn_output         	    169.21	 0.0022	      0.4015	  0.0472	   0.0414	  100.00%	  3584	     11.3066	  95.76%	    14.56%
    9	attn_output         	    166.98	 0.0021	      1.5864	  0.0466	   0.0605	  100.00%	  3584	     11.1723	  94.62%	     5.69%
   10	attn_output         	    165.81	 0.0026	      0.9828	  0.0463	   0.0536	  100.00%	  3584	     11.3118	  95.80%	     5.94%
    8	attn_output         	    159.54	 0.0019	      1.1831	  0.0445	   0.0583	  100.00%	  3584	     11.1678	  94.58%	     7.00%
    0	attn_output         	    131.48	 0.0005	      6.6774	  0.0367	   0.2584	  100.00%	  3584	      8.9836	  76.08%	     0.98%
    6	attn_output         	     86.10	 0.0007	      0.3468	  0.0240	   0.0258	  100.00%	  3584	     11.2370	  95.17%	     7.65%
    3	attn_output         	     74.09	 0.0010	      0.5955	  0.0207	   0.0225	  100.00%	  3584	     11.2807	  95.54%	     8.45%
    4	attn_output         	     51.35	 0.0002	      0.9319	  0.0143	   0.0335	  100.00%	  3584	     10.8659	  92.03%	     2.20%
    5	attn_output         	     46.97	 0.0011	      0.4940	  0.0131	   0.0244	  100.00%	  3584	     10.9951	  93.12%	     4.19%
    2	attn_output         	     36.31	 0.0010	      0.9631	  0.0101	   0.0260	  100.00%	  3584	     10.8809	  92.15%	     3.10%
    1	attn_output         	     23.60	 0.0001	      0.4081	  0.0066	   0.0181	  100.00%	  3584	     10.5325	  89.20%	     3.18%
   27	attn_q              	   5141.31	 0.0578	    405.6018	  1.4345	   8.5063	  100.00%	  3584	      8.2161	  69.58%	     5.05%
   26	attn_q              	   3514.78	 0.0014	    336.0238	  0.9807	   6.3577	  100.00%	  3584	      8.6701	  73.43%	     4.77%
   23	attn_q              	   2577.34	 0.0711	    107.3467	  0.7191	   2.8482	  100.00%	  3584	      9.2976	  78.74%	     5.36%
   25	attn_q              	   2416.49	 0.0523	    192.7465	  0.6742	   3.6958	  100.00%	  3584	      9.4202	  79.78%	     4.85%
   24	attn_q              	   2345.51	 0.0433	    235.1290	  0.6544	   4.3505	  100.00%	  3584	      9.3335	  79.05%	     2.68%
   22	attn_q              	   2341.42	 0.0616	    106.0560	  0.6533	   2.9773	  100.00%	  3584	      9.3443	  79.14%	     2.87%
   21	attn_q              	   1465.48	 0.0488	     65.1086	  0.4089	   1.8415	  100.00%	  3584	      9.7659	  82.71%	     1.95%
   19	attn_q              	   1354.92	 0.0160	     64.9419	  0.3780	   2.0088	  100.00%	  3584	      9.4633	  80.15%	     1.79%
   20	attn_q              	   1271.46	 0.0245	     58.6785	  0.3548	   1.7495	  100.00%	  3584	      9.6939	  82.10%	     1.84%
   16	attn_q              	   1217.92	 0.0000	     68.7396	  0.3398	   1.8574	  100.00%	  3584	      9.2844	  78.63%	     1.81%
   17	attn_q              	   1193.92	 0.0139	     50.0219	  0.3331	   1.5332	  100.00%	  3584	      9.6450	  81.69%	     1.90%
   14	attn_q              	   1188.44	 0.0079	     48.7036	  0.3316	   1.4011	  100.00%	  3584	      9.6869	  82.04%	     2.37%
   18	attn_q              	   1001.68	 0.0072	     54.0705	  0.2795	   1.4768	  100.00%	  3584	      9.6582	  81.80%	     1.48%
   15	attn_q              	    923.17	 0.0020	     32.2622	  0.2576	   1.1821	  100.00%	  3584	      9.4031	  79.64%	     2.46%
    8	attn_q              	    784.03	 0.0082	     12.9517	  0.2188	   0.6849	  100.00%	  3584	     10.1589	  86.04%	     2.85%
   13	attn_q              	    752.92	 0.0000	     25.2086	  0.2101	   0.7649	   99.97%	  3584	     10.2496	  86.81%	     1.87%
   12	attn_q              	    738.25	 0.0061	     24.0529	  0.2060	   0.7757	  100.00%	  3584	     10.1182	  85.69%	     1.90%
    9	attn_q              	    733.39	 0.0000	     16.4946	  0.2046	   0.6262	  100.00%	  3584	     10.5356	  89.23%	     2.20%
    4	attn_q              	    689.25	 0.0000	     26.4802	  0.1923	   1.1755	   98.80%	  3584	      8.4224	  71.33%	     1.76%
    5	attn_q              	    687.23	 0.0000	     31.9846	  0.1917	   0.7180	   99.89%	  3584	     10.1248	  85.75%	     2.54%
   11	attn_q              	    685.48	 0.0080	     17.6951	  0.1913	   0.7004	  100.00%	  3584	     10.0526	  85.14%	     2.20%
   10	attn_q              	    630.31	 0.0076	     16.3245	  0.1759	   0.6634	  100.00%	  3584	     10.1971	  86.36%	     2.01%
    7	attn_q              	    615.92	 0.0000	     12.5285	  0.1719	   0.5429	  100.00%	  3584	     10.4200	  88.25%	     1.87%
    6	attn_q              	    499.66	 0.0000	     16.2125	  0.1394	   0.6909	   99.89%	  3584	      9.6434	  81.67%	     1.31%
    3	attn_q              	    308.74	 0.0000	     11.9797	  0.0861	   0.3259	   98.07%	  3584	      9.5947	  81.26%	     4.94%
    2	attn_q              	    258.92	 0.0000	      7.6345	  0.0722	   0.2554	   94.81%	  3584	      9.8862	  83.73%	     3.26%
    0	attn_q              	    120.98	 0.0000	     11.3855	  0.0338	   0.1961	   99.97%	  3584	     10.8332	  91.75%	     0.39%
    1	attn_q              	     68.39	 0.0000	      7.4842	  0.0191	   0.1749	   86.05%	  3584	      7.8550	  66.53%	     1.34%
   27	attn_v              	   5141.31	 0.0578	    405.6018	  1.4345	   8.5063	  100.00%	  3584	      8.2161	  69.58%	     5.05%
   26	attn_v              	   3514.78	 0.0014	    336.0238	  0.9807	   6.3577	  100.00%	  3584	      8.6701	  73.43%	     4.77%
   23	attn_v              	   2577.34	 0.0711	    107.3467	  0.7191	   2.8482	  100.00%	  3584	      9.2976	  78.74%	     5.36%
   25	attn_v              	   2416.49	 0.0523	    192.7465	  0.6742	   3.6958	  100.00%	  3584	      9.4202	  79.78%	     4.85%
   24	attn_v              	   2345.51	 0.0433	    235.1290	  0.6544	   4.3505	  100.00%	  3584	      9.3335	  79.05%	     2.68%
   22	attn_v              	   2341.42	 0.0616	    106.0560	  0.6533	   2.9773	  100.00%	  3584	      9.3443	  79.14%	     2.87%
   21	attn_v              	   1465.48	 0.0488	     65.1086	  0.4089	   1.8415	  100.00%	  3584	      9.7659	  82.71%	     1.95%
   19	attn_v              	   1354.92	 0.0160	     64.9419	  0.3780	   2.0088	  100.00%	  3584	      9.4633	  80.15%	     1.79%
   20	attn_v              	   1271.46	 0.0245	     58.6785	  0.3548	   1.7495	  100.00%	  3584	      9.6939	  82.10%	     1.84%
   16	attn_v              	   1217.92	 0.0000	     68.7396	  0.3398	   1.8574	  100.00%	  3584	      9.2844	  78.63%	     1.81%
   17	attn_v              	   1193.92	 0.0139	     50.0219	  0.3331	   1.5332	  100.00%	  3584	      9.6450	  81.69%	     1.90%
   14	attn_v              	   1188.44	 0.0079	     48.7036	  0.3316	   1.4011	  100.00%	  3584	      9.6869	  82.04%	     2.37%
   18	attn_v              	   1001.68	 0.0072	     54.0705	  0.2795	   1.4768	  100.00%	  3584	      9.6582	  81.80%	     1.48%
   15	attn_v              	    923.17	 0.0020	     32.2622	  0.2576	   1.1821	  100.00%	  3584	      9.4031	  79.64%	     2.46%
    8	attn_v              	    784.03	 0.0082	     12.9517	  0.2188	   0.6849	  100.00%	  3584	     10.1589	  86.04%	     2.85%
   13	attn_v              	    752.92	 0.0000	     25.2086	  0.2101	   0.7649	   99.97%	  3584	     10.2496	  86.81%	     1.87%
   12	attn_v              	    738.25	 0.0061	     24.0529	  0.2060	   0.7757	  100.00%	  3584	     10.1182	  85.69%	     1.90%
    9	attn_v              	    733.39	 0.0000	     16.4946	  0.2046	   0.6262	  100.00%	  3584	     10.5356	  89.23%	     2.20%
    4	attn_v              	    689.25	 0.0000	     26.4802	  0.1923	   1.1755	   98.80%	  3584	      8.4224	  71.33%	     1.76%
    5	attn_v              	    687.23	 0.0000	     31.9846	  0.1917	   0.7180	   99.89%	  3584	     10.1248	  85.75%	     2.54%
   11	attn_v              	    685.48	 0.0080	     17.6951	  0.1913	   0.7004	  100.00%	  3584	     10.0526	  85.14%	     2.20%
   10	attn_v              	    630.31	 0.0076	     16.3245	  0.1759	   0.6634	  100.00%	  3584	     10.1971	  86.36%	     2.01%
    7	attn_v              	    615.92	 0.0000	     12.5285	  0.1719	   0.5429	  100.00%	  3584	     10.4200	  88.25%	     1.87%
    6	attn_v              	    499.66	 0.0000	     16.2125	  0.1394	   0.6909	   99.89%	  3584	      9.6434	  81.67%	     1.31%
    3	attn_v              	    308.74	 0.0000	     11.9797	  0.0861	   0.3259	   98.07%	  3584	      9.5947	  81.26%	     4.94%
    2	attn_v              	    258.92	 0.0000	      7.6345	  0.0722	   0.2554	   94.81%	  3584	      9.8862	  83.73%	     3.26%
    0	attn_v              	    120.98	 0.0000	     11.3855	  0.0338	   0.1961	   99.97%	  3584	     10.8332	  91.75%	     0.39%
    1	attn_v              	     68.39	 0.0000	      7.4842	  0.0191	   0.1749	   86.05%	  3584	      7.8550	  66.53%	     1.34%
   27	ffn_down            	 355884.75	 0.0159	   6837.1255	 18.7861	 148.5242	  100.00%	 18944	     10.7816	  75.88%	     1.45%
   26	ffn_down            	 181419.47	 0.0260	  43328.5547	  9.5766	 321.8018	  100.00%	 18944	      9.1996	  64.74%	     0.10%
   25	ffn_down            	  38754.11	 0.0107	   2872.8489	  2.0457	  36.8919	  100.00%	 18944	     10.0465	  70.70%	     0.26%
   24	ffn_down            	  19443.91	 0.0114	   2827.7163	  1.0264	  21.8617	  100.00%	 18944	     10.4168	  73.31%	     0.28%
   23	ffn_down            	  12473.19	 0.0139	   1799.1183	  0.6584	  13.9010	  100.00%	 18944	     10.7399	  75.58%	     0.31%
    3	ffn_down            	  10822.42	 0.0001	    989.6157	  0.5713	  12.3155	  100.00%	 18944	      6.5990	  46.44%	     0.57%
   22	ffn_down            	   8961.94	 0.0151	    933.6822	  0.4731	   7.0275	  100.00%	 18944	     11.4126	  80.32%	     0.62%
   21	ffn_down            	   3950.82	 0.0160	     84.4493	  0.2086	   0.8990	  100.00%	 18944	     12.4962	  87.94%	     2.68%
    4	ffn_down            	   3913.25	 0.0001	   1316.8596	  0.2066	  13.8787	  100.00%	 18944	      3.8574	  27.15%	     0.07%
   20	ffn_down            	   2835.57	 0.0176	    104.7299	  0.1497	   1.0732	  100.00%	 18944	     12.2692	  86.35%	     1.29%
   11	ffn_down            	   1457.54	 0.0101	    889.8758	  0.0769	   6.4658	  100.00%	 18944	      6.2602	  44.06%	     0.01%
   19	ffn_down            	   1415.36	 0.0098	     18.9129	  0.0747	   0.2602	  100.00%	 18944	     13.0607	  91.92%	     2.28%
   18	ffn_down            	   1172.48	 0.0037	     47.6772	  0.0619	   0.3838	  100.00%	 18944	     12.8918	  90.73%	     1.00%
    9	ffn_down            	    984.12	 0.0029	     16.6916	  0.0519	   0.1486	  100.00%	 18944	     13.4853	  94.90%	     1.73%
   17	ffn_down            	    937.13	 0.0120	     47.2292	  0.0495	   0.3552	  100.00%	 18944	     13.1493	  92.54%	     0.52%
    7	ffn_down            	    741.61	 0.0056	      5.7790	  0.0391	   0.0622	  100.00%	 18944	     13.7068	  96.46%	     4.46%
    8	ffn_down            	    733.18	 0.0076	     10.2930	  0.0387	   0.0886	  100.00%	 18944	     13.7211	  96.56%	     2.00%
   15	ffn_down            	    711.79	 0.0076	     13.4870	  0.0376	   0.1184	  100.00%	 18944	     13.4602	  94.73%	     1.81%
   16	ffn_down            	    711.00	 0.0110	      8.7637	  0.0375	   0.0839	  100.00%	 18944	     13.6264	  95.90%	     2.70%
    6	ffn_down            	    693.73	 0.0018	      3.3237	  0.0366	   0.0686	  100.00%	 18944	     13.4328	  94.53%	     4.30%
   14	ffn_down            	    674.16	 0.0091	      4.7583	  0.0356	   0.0729	  100.00%	 18944	     13.5277	  95.20%	     3.30%
   12	ffn_down            	    628.72	 0.0093	     11.2445	  0.0332	   0.1058	  100.00%	 18944	     13.4942	  94.97%	     1.56%
   10	ffn_down            	    628.51	 0.0083	      6.9205	  0.0332	   0.0651	  100.00%	 18944	     13.7703	  96.91%	     2.26%
   13	ffn_down            	    623.54	 0.0070	     14.6682	  0.0329	   0.1219	  100.00%	 18944	     13.4610	  94.73%	     1.36%
    5	ffn_down            	    425.43	 0.0001	     65.9802	  0.0225	   0.4873	  100.00%	 18944	     11.1274	  78.31%	     0.18%
    2	ffn_down            	    362.44	 0.0000	      1.6931	  0.0191	   0.0493	   83.49%	 18944	     12.4262	  87.45%	     6.37%
    1	ffn_down            	    161.42	 0.0000	      1.9775	  0.0085	   0.0446	   61.76%	 18944	     10.9874	  77.32%	     2.76%
    0	ffn_down            	     93.17	 0.0000	      1.3730	  0.0049	   0.0183	  100.00%	 18944	     12.3459	  86.88%	     3.40%
   27	ffn_gate            	   8203.51	 0.0000	    728.1832	  2.2889	  15.0930	   99.97%	  3584	     10.3009	  87.24%	     0.70%
    1	ffn_gate            	   7649.28	 0.0000	   4250.2856	  2.1343	  73.6208	  100.00%	  3584	      3.2319	  27.37%	     0.22%
    5	ffn_gate            	   5793.46	 0.2630	   1696.2799	  1.6165	  30.9683	  100.00%	  3584	      6.4787	  54.87%	     0.39%
   26	ffn_gate            	   4977.79	 0.0001	    346.2318	  1.3889	   7.1514	  100.00%	  3584	     10.3352	  87.53%	     1.03%
    3	ffn_gate            	   4928.84	 0.1158	   1178.4656	  1.3752	  24.0211	  100.00%	  3584	      6.1368	  51.97%	     0.36%
   25	ffn_gate            	   4345.41	 0.0000	    391.9680	  1.2124	   7.5277	  100.00%	  3584	     10.3049	  87.28%	     0.78%
    2	ffn_gate            	   4145.53	 0.0000	   1567.8757	  1.1567	  28.9319	   99.97%	  3584	      4.7073	  39.87%	     0.28%
    4	ffn_gate            	   3605.02	 0.0000	    501.6380	  1.0059	  13.3321	  100.00%	  3584	      7.2867	  61.71%	     0.45%
   24	ffn_gate            	   3309.81	 0.0000	    221.9663	  0.9235	   5.1778	  100.00%	  3584	     10.4013	  88.09%	     0.92%
   23	ffn_gate            	   2978.69	 0.0000	    253.4090	  0.8311	   4.8293	  100.00%	  3584	     10.3654	  87.79%	     0.73%
   22	ffn_gate            	   2140.05	 0.0000	    152.6495	  0.5971	   3.2064	   99.97%	  3584	     10.3133	  87.35%	     0.78%
    9	ffn_gate            	   1605.21	 0.0000	    138.4068	  0.4479	   2.8957	  100.00%	  3584	     10.2616	  86.91%	     0.45%
   21	ffn_gate            	   1491.98	 0.0000	     89.1156	  0.4163	   1.9106	  100.00%	  3584	     10.4835	  88.79%	     1.00%
   20	ffn_gate            	   1104.55	 0.0000	     61.6396	  0.3082	   1.3331	  100.00%	  3584	     10.6024	  89.79%	     1.23%
   19	ffn_gate            	    923.42	 0.0000	     54.8742	  0.2577	   1.1880	  100.00%	  3584	     10.5703	  89.52%	     1.12%
    6	ffn_gate            	    795.71	 0.0000	    179.4834	  0.2220	   3.0785	  100.00%	  3584	      9.2320	  78.19%	     0.20%
   18	ffn_gate            	    764.25	 0.0000	     53.1881	  0.2132	   1.0228	   99.97%	  3584	     10.6846	  90.49%	     0.81%
   17	ffn_gate            	    696.13	 0.0000	     44.8044	  0.1942	   0.8129	   99.97%	  3584	     10.9804	  93.00%	     0.73%
   10	ffn_gate            	    627.04	 0.0000	     32.8056	  0.1750	   0.6096	  100.00%	  3584	     11.1592	  94.51%	     0.64%
    8	ffn_gate            	    614.92	 0.0000	     19.9203	  0.1716	   0.4671	   99.97%	  3584	     11.2903	  95.62%	     0.50%
   16	ffn_gate            	    612.27	 0.0000	     32.4457	  0.1708	   0.6095	   99.97%	  3584	     11.0999	  94.01%	     0.73%
   14	ffn_gate            	    605.78	 0.0000	     30.1453	  0.1690	   0.6111	  100.00%	  3584	     11.0724	  93.78%	     0.70%
   15	ffn_gate            	    584.58	 0.0000	     27.9312	  0.1631	   0.5630	   99.97%	  3584	     11.0423	  93.52%	     1.00%
    7	ffn_gate            	    581.64	 0.0000	     21.0149	  0.1623	   0.5479	   99.92%	  3584	     11.1018	  94.02%	     0.47%
   13	ffn_gate            	    561.19	 0.0000	     22.6935	  0.1566	   0.4936	   99.97%	  3584	     11.1464	  94.40%	     0.73%
   11	ffn_gate            	    552.21	 0.0000	     22.1247	  0.1541	   0.4128	   99.97%	  3584	     11.3085	  95.78%	     0.67%
   12	ffn_gate            	    531.12	 0.0000	     16.9325	  0.1482	   0.3588	   99.97%	  3584	     11.3057	  95.75%	     0.81%
    0	ffn_gate            	    113.10	 0.0000	     45.3427	  0.0316	   0.7576	   99.58%	  3584	      7.6704	  64.96%	     0.06%
   27	ffn_up              	   8203.51	 0.0000	    728.1832	  2.2889	  15.0930	   99.97%	  3584	     10.3009	  87.24%	     0.70%
    1	ffn_up              	   7649.28	 0.0000	   4250.2856	  2.1343	  73.6208	  100.00%	  3584	      3.2319	  27.37%	     0.22%
    5	ffn_up              	   5793.46	 0.2630	   1696.2799	  1.6165	  30.9683	  100.00%	  3584	      6.4787	  54.87%	     0.39%
   26	ffn_up              	   4977.79	 0.0001	    346.2318	  1.3889	   7.1514	  100.00%	  3584	     10.3352	  87.53%	     1.03%
    3	ffn_up              	   4928.84	 0.1158	   1178.4656	  1.3752	  24.0211	  100.00%	  3584	      6.1368	  51.97%	     0.36%
   25	ffn_up              	   4345.41	 0.0000	    391.9680	  1.2124	   7.5277	  100.00%	  3584	     10.3049	  87.28%	     0.78%
    2	ffn_up              	   4145.53	 0.0000	   1567.8757	  1.1567	  28.9319	   99.97%	  3584	      4.7073	  39.87%	     0.28%
    4	ffn_up              	   3605.02	 0.0000	    501.6380	  1.0059	  13.3321	  100.00%	  3584	      7.2867	  61.71%	     0.45%
   24	ffn_up              	   3309.81	 0.0000	    221.9663	  0.9235	   5.1778	  100.00%	  3584	     10.4013	  88.09%	     0.92%
   23	ffn_up              	   2978.69	 0.0000	    253.4090	  0.8311	   4.8293	  100.00%	  3584	     10.3654	  87.79%	     0.73%
   22	ffn_up              	   2140.05	 0.0000	    152.6495	  0.5971	   3.2064	   99.97%	  3584	     10.3133	  87.35%	     0.78%
    9	ffn_up              	   1605.21	 0.0000	    138.4068	  0.4479	   2.8957	  100.00%	  3584	     10.2616	  86.91%	     0.45%
   21	ffn_up              	   1491.98	 0.0000	     89.1156	  0.4163	   1.9106	  100.00%	  3584	     10.4835	  88.79%	     1.00%
   20	ffn_up              	   1104.55	 0.0000	     61.6396	  0.3082	   1.3331	  100.00%	  3584	     10.6024	  89.79%	     1.23%
   19	ffn_up              	    923.42	 0.0000	     54.8742	  0.2577	   1.1880	  100.00%	  3584	     10.5703	  89.52%	     1.12%
    6	ffn_up              	    795.71	 0.0000	    179.4834	  0.2220	   3.0785	  100.00%	  3584	      9.2320	  78.19%	     0.20%
   18	ffn_up              	    764.25	 0.0000	     53.1881	  0.2132	   1.0228	   99.97%	  3584	     10.6846	  90.49%	     0.81%
   17	ffn_up              	    696.13	 0.0000	     44.8044	  0.1942	   0.8129	   99.97%	  3584	     10.9804	  93.00%	     0.73%
   10	ffn_up              	    627.04	 0.0000	     32.8056	  0.1750	   0.6096	  100.00%	  3584	     11.1592	  94.51%	     0.64%
    8	ffn_up              	    614.92	 0.0000	     19.9203	  0.1716	   0.4671	   99.97%	  3584	     11.2903	  95.62%	     0.50%
   16	ffn_up              	    612.27	 0.0000	     32.4457	  0.1708	   0.6095	   99.97%	  3584	     11.0999	  94.01%	     0.73%
   14	ffn_up              	    605.78	 0.0000	     30.1453	  0.1690	   0.6111	  100.00%	  3584	     11.0724	  93.78%	     0.70%
   15	ffn_up              	    584.58	 0.0000	     27.9312	  0.1631	   0.5630	   99.97%	  3584	     11.0423	  93.52%	     1.00%
    7	ffn_up              	    581.64	 0.0000	     21.0149	  0.1623	   0.5479	   99.92%	  3584	     11.1018	  94.02%	     0.47%
   13	ffn_up              	    561.19	 0.0000	     22.6935	  0.1566	   0.4936	   99.97%	  3584	     11.1464	  94.40%	     0.73%
   11	ffn_up              	    552.21	 0.0000	     22.1247	  0.1541	   0.4128	   99.97%	  3584	     11.3085	  95.78%	     0.67%
   12	ffn_up              	    531.12	 0.0000	     16.9325	  0.1482	   0.3588	   99.97%	  3584	     11.3057	  95.75%	     0.81%
    0	ffn_up              	    113.10	 0.0000	     45.3427	  0.0316	   0.7576	   99.58%	  3584	      7.6704	  64.96%	     0.06%
    -	output              	  37753.27	 2.9640	   3264.6670	 10.5338	  70.4707	  100.00%	  3584	      9.4367	  79.92%	     1.42%

Apr 15 '25 07:04 EAddario

Added cosine similarity between same type tensors with respect to the previous layer (i.e. blk.7.attn_k and blk.6.attn_k)

Apr 22 '25 17:04 EAddario

Apologies for shotgun approach @ngxson / @jukofyork / @compilade, I'm not sure what the proper process to request a review is. Happy to close or move to draft if it's not suitable for merging

May 03 '25 06:05 EAddario

Thanks @EAddario for keeping this line of research open. One of the too many things I'm interested in checking out is your stats across a few competitive imatrix / quant providers e.g. per this discussion https://github.com/ikawrakow/ik_llama.cpp/discussions/359#discussioncomment-13021815 as folks are digging into the latest quantization trends and how much they differ and how to meaningfully compare.

Some way to visualize the results side by side would probably be easier on my brain than looking at the giant tables of stats.. I'll noodle on that.

Anyway, much thanks from a fellow hacker engineer! :)

May 03 '25 16:05 ubergarm

@EAddario

I vibe coded up some python/image magick scripts to visualize the output of your --show-statistics to compare three imatrix files for Qwen3-30B-A3B from myself, unsloth, and bartowski.

I'm not really sure how to read it, and for the most part they seem to have similar patterns though with some discrepancies. They are not normalized to each other, just a stacked mosaic which was easiest to quickly "visually diff" them.

https://gist.github.com/ubergarm/2aa9327f7b98a9b16fef62b4941c7e76

May 04 '25 04:05 ubergarm

@ubergarm, sorry for delayed reply, it was a hectic week at work. Love the visualizations and your reddit post btw.

The discrepancies are due to using different calibration files to generate the respective imatricies. On quick inspection, your imatrix seems to have "exercised" more weights (it has stronger/larger activations), and the mean of means is considerably larger (Bartowski: 1.94, Ubergarm: 2.31, Unsloth: 2.0)

Ignoring the basic stats (min, max, mean, std dev, etc), I find that the sum of activations (bias) is the most useful metric to select which layers to up/down quantize, as it yields the lowest PPL compared to using ZD or CosSim, or at least that's how I'm reading the tea leaves. All the models in my HF repo have now been generated that way.

For larger models (30B+) however, I'm looking at a different approach by combining layer-wise with pruning. That PR is in draft as I'm testing it works as expected with split gguf files, but give it a try if you have some spare time.

For pruning, it's looking like CosSim would be the better way to identify layers to remove. I'll push a new version of this PR with added functionality in a few days.

Since you have a good set of test results for Qwen3-30B-A3B, I'll produce layer-wise and a layer-wise+pruned versions for an apples-to-apples comparison.

May 10 '25 22:05 EAddario

@ubergarm, just finished uploading Qwen3-30B-A3B-GGUF. Summary of scores in the model card, and actual results in the scores folder.

A few things to consider:

got a case of stubborn tensors 🙂: block 43 ffn_down_exps, ffn_gate_exps and ffn_up_exps refused to be activated by my calibration file, hence the somewhat smaller imatrix size compared to the ones used in your tests (114MB vs 116MB). I'll improve the calibration and will retry.
for reference, the mix used to generate Q4_K_M was --token-embedding-type q3_k --output-tensor-type q4_k --tensor-type "\.([0-9]|1[0-9]|2[0-3])\.attn_k=q3_k" --tensor-type "\.([0-9]|1[0-9]|2[0-3])\.attn_q=q3_k" --tensor-type "\.([0-9]|1[0-9]|2[0-3])\.attn_v=q4_k" --tensor-type attn_v=q5_k --tensor-type "\.([0-9]|1[0124]|1[6-9]|2[0-4]|26)\.ffn_gate_exps=q3_k" --tensor-type "\.([0-9]|1[0124]|1[6-9]|2[0-4]|26)\.ffn_up_exps=q3_k" --tensor-type ffn_down_exps=q5_k which corresponds to down-quantizing blocks with the lowest Σ(Bias) per tensor-type
dump of the gguf structure in the scores folder as well
the above mix resulted in a ~8% smaller file compared to naive, with a PPL of 99.07%
I'll upload the Q4_K_M scores for Bartowski and Unsloth when I get a chance
for large param models (30B+), it seems the best size reduction, compared to an equivalent naive quant, is below 10% hence exploring the pruning route. Will try upload pruned models in the next couple of weeks

May 12 '25 21:05 EAddario

@ubergarm, just finished uploading Qwen3-30B-A3B-GGUF. Summary of scores in the model card, and actual results in the scores folder.

A few things to consider:

* got a case of stubborn tensors 🙂: block 43 ffn_down_exps, ffn_gate_exps and ffn_up_exps refused to be activated by my calibration file, hence the somewhat smaller imatrix size compared to the ones used in your tests (114MB vs 116MB). I'll improve the calibration and will retry.

* for reference, the mix used to generate Q4_K_M was `--token-embedding-type q3_k --output-tensor-type q4_k --tensor-type "\.([0-9]|1[0-9]|2[0-3])\.attn_k=q3_k" --tensor-type "\.([0-9]|1[0-9]|2[0-3])\.attn_q=q3_k" --tensor-type "\.([0-9]|1[0-9]|2[0-3])\.attn_v=q4_k" --tensor-type attn_v=q5_k --tensor-type "\.([0-9]|1[0124]|1[6-9]|2[0-4]|26)\.ffn_gate_exps=q3_k" --tensor-type "\.([0-9]|1[0124]|1[6-9]|2[0-4]|26)\.ffn_up_exps=q3_k" --tensor-type ffn_down_exps=q5_k` which corresponds to down-quantizing blocks with the lowest Σ(Bias) per tensor-type

* dump of the gguf structure in the _scores_ folder as well

* the above mix resulted in a ~8% smaller file compared to naive, with a PPL of 99.07%

* I'll upload the Q4_K_M scores for Bartowski and Unsloth when I get a chance

* for large param models (30B+), it seems the best size reduction, compared to an equivalent naive quant, is below 10% hence exploring the pruning route. Will try upload pruned models in the next couple of weeks

@EAddario

Had/have same issue with almost all the Qwen 30B-A3B (and fine tunes - but less of an issue on abliterated / uncensored) ; also the Qwen3 16B-A3B is much easier to work with .

Resolve: Increase the number of active experts, then calb / imatrix.

This issue seems to affect Qwen3 moes, Llama 4 moes, and maybe an issue with new Granite 4 Moe(s). Issue seem to stem from one or more:

number of experts VS active experts.
size of experts (sub 1B) ?
Imatrix calb file -> even my best need a lot of experts on or huge imatrix dataset/files.

May 19 '25 03:05 David-AU-github

Added weighted statistics per layer (as opposed to per tensor) for Σ(Bias), ZD and CosSim.

Whereas per-tensor statistics are helpful to identify which tensors to up/down quantize when using the llama-quantize --tensor-type option, the per-layer seem to be useful to guide pruning (PR #13037).

For example using llama-imatrix --show-statistics on Bartowski's Qwen_Qwen3-30B-A3B.imatrix the report will also include the following:

Computing weighted statistics per layer (48 layers)

  Layer	      Σ(Bias)	       ZD	CosSim
===============================================
    0	       6061.60	    1.7410%	0.0000
    1	      11640.92	    0.5245%	0.5026
    2	      28831.86	    0.0572%	0.0268
    3	      23922.71	    0.1373%	0.0164
    4	      24552.72	    0.7992%	0.1391
    5	      26268.89	    1.0561%	0.1228
    6	      31291.89	    0.6889%	0.0804
    7	      31656.05	    0.7584%	0.0716
    8	      31301.72	    0.5691%	0.1416
    9	      32720.18	    0.6998%	0.1249
   10	      32996.34	    0.8920%	0.0973
   11	      35464.12	    1.1090%	0.0915
   12	      36960.77	    1.0656%	0.1651
   13	      41183.71	    1.1070%	0.1192
   14	      42311.34	    0.8827%	0.0878
   15	      47201.43	    0.7962%	0.1710
   16	      48093.70	    1.2141%	0.1126
   17	      48655.36	    1.0467%	0.1230
   18	      61740.04	    0.6507%	0.0841
   19	      56956.36	    0.8035%	0.0679
   20	      53476.46	    0.9887%	0.2341
   21	      51980.02	    0.7850%	0.1135
   22	      57266.32	    0.2736%	0.0519
   23	      60166.23	    1.3932%	0.0503
   24	      61621.80	    1.1692%	0.0996
   25	      67485.11	    0.3240%	0.0499
   26	      62086.84	    1.4648%	0.0541
   27	      68444.67	    1.1134%	0.1000
   28	      72305.86	    1.0958%	0.2497
   29	      73116.39	    0.8709%	0.0691
   30	      85119.15	    0.9191%	0.0798
   31	      79558.44	    1.0303%	0.1371
   32	      76364.62	    0.6333%	0.2268
   33	      77971.35	    0.7549%	0.1053
   34	      91862.55	    0.8468%	0.2912
   35	      91125.50	    1.3788%	0.0777
   36	      93407.48	    1.1753%	0.1083
   37	     108214.91	    1.3324%	0.1090
   38	     114657.11	    1.4874%	0.1249
   39	     135057.31	    1.0944%	0.1257
   40	     157113.47	    1.2505%	0.2380
   41	     179461.16	    1.1735%	0.1400
   42	     209527.08	    0.9848%	0.0949
   43	     231601.23	    1.5293%	0.1263
   44	     253612.80	    1.6118%	0.2824
   45	     284678.88	    1.4165%	0.2286
   46	     315767.62	    1.2535%	0.2478
   47	     461904.62	    0.7146%	0.2353

May 26 '25 11:05 EAddario

Apologies for shotgun approach @slaren / @CISC / @ggerganov, not sure what the proper process to request a review is. Happy to close or move to draft if it's not suitable for merging

Jun 21 '25 11:06 EAddario

@compilade, would love to see #9400 being merged into master! It will open some really interesting possibilities, like being able to store tensors' state alongside the activations, for example. That would allow for more powerful stats, a clean way to test different imatrices, etc. Coding skills are probably not up to scratch, but would be happy to lend a hand

Jun 22 '25 21:06 EAddario

Hi @compilade, good to go? or should I change something else?

Jul 06 '25 06:07 EAddario

The only thing that's bothering me is that the stats aren't calculated on the actual activations

Agree 100% and no doubts #9400 will provide a neat way to address this. In the meantime, I'll update the README.md file over the weekend to document what the new option is actually doing and what the calculated stats really mean, and will then re-request a review

Jul 07 '25 20:07 EAddario

@compilade, I've updated the README file to reflect current limitations when calculating the stats, and made a note to reimplement/improve the functionality once #9400 is merged. Until then, completely up to you to merge now, or wait until #9400 is in place.

Jul 12 '25 10:07 EAddario

@compilade / @CISC, I have finished merging and testing. As far as I can tell, I think is good to go. I don't plan to make any more changes on this PR

Jul 20 '25 12:07 EAddario

Thank you @CISC and @compilade

Jul 22 '25 21:07 EAddario