llama.cpp
llama.cpp copied to clipboard
Use full range for q4_0 quantization
By preserving the sign of the highest magnitude value, we can make sure the highest value maps to -8 in our [-8, 7] range, which is currently unused. This is a bit of a freebie since the change is fully backwards compatible with the current format.
This was also noted in https://github.com/ggerganov/llama.cpp/issues/397#issuecomment-1480291055 but has not been fixed in the code yet.
This PR only updates the reference implementation, not the SIMD accelerated versions.
quantize-stats output: (see #728) before(7B): q4_0 : mse 0.00000492, maxerr 0.14257812, 95pct<0.0040, median<0.0018 q4_1 : mse 0.00000318, maxerr 0.12756348, 95pct<0.0034, median<0.0014 after(7B): q4_0 : mse 0.00000386, maxerr 0.18200684, 95pct<0.0036, median<0.0016 q4_1 : mse 0.00000318, maxerr 0.12756348, 95pct<0.0034, median<0.0014
Most layers seem to have reduced maxerr, but the total max error is actually slightly higher TODO: run perplexity
quantize-stats before (7B)
q4_0::layers.0.attention.wk.weight : mse 0.00001216, maxerr 0.07836914, 95pct<0.0070, median<0.0022
q4_0::layers.0.attention.wo.weight : mse 0.00000145, maxerr 0.03571428, 95pct<0.0024, median<0.0010
q4_0::layers.0.attention.wq.weight : mse 0.00001269, maxerr 0.05469622, 95pct<0.0074, median<0.0020
q4_0::layers.0.attention.wv.weight : mse 0.00000176, maxerr 0.00775364, 95pct<0.0026, median<0.0010
q4_0::layers.0.feed_forward.w1.weight : mse 0.00000271, maxerr 0.08459473, 95pct<0.0030, median<0.0014
q4_0::layers.0.feed_forward.w2.weight : mse 0.00000403, maxerr 0.05426896, 95pct<0.0036, median<0.0016
q4_0::layers.0.feed_forward.w3.weight : mse 0.00000254, maxerr 0.02079773, 95pct<0.0030, median<0.0014
q4_0::layers.1.attention.wk.weight : mse 0.00001151, maxerr 0.04473877, 95pct<0.0072, median<0.0020
q4_0::layers.1.attention.wo.weight : mse 0.00000136, maxerr 0.03808594, 95pct<0.0024, median<0.0008
q4_0::layers.1.attention.wq.weight : mse 0.00001098, maxerr 0.03996059, 95pct<0.0070, median<0.0020
q4_0::layers.1.attention.wv.weight : mse 0.00000130, maxerr 0.00764084, 95pct<0.0022, median<0.0010
q4_0::layers.1.feed_forward.w1.weight : mse 0.00000437, maxerr 0.04241943, 95pct<0.0038, median<0.0018
q4_0::layers.1.feed_forward.w2.weight : mse 0.00000422, maxerr 0.06106567, 95pct<0.0038, median<0.0018
q4_0::layers.1.feed_forward.w3.weight : mse 0.00000394, maxerr 0.02634103, 95pct<0.0036, median<0.0016
q4_0::layers.10.attention.wk.weight : mse 0.00000732, maxerr 0.02239336, 95pct<0.0054, median<0.0020
q4_0::layers.10.attention.wo.weight : mse 0.00000298, maxerr 0.03090122, 95pct<0.0032, median<0.0014
q4_0::layers.10.attention.wq.weight : mse 0.00000716, maxerr 0.04156494, 95pct<0.0052, median<0.0020
q4_0::layers.10.attention.wv.weight : mse 0.00000300, maxerr 0.01774597, 95pct<0.0032, median<0.0014
q4_0::layers.10.feed_forward.w1.weight : mse 0.00000484, maxerr 0.02800424, 95pct<0.0040, median<0.0018
q4_0::layers.10.feed_forward.w2.weight : mse 0.00000438, maxerr 0.05075073, 95pct<0.0038, median<0.0018
q4_0::layers.10.feed_forward.w3.weight : mse 0.00000453, maxerr 0.02395630, 95pct<0.0038, median<0.0018
q4_0::layers.11.attention.wk.weight : mse 0.00000783, maxerr 0.02528381, 95pct<0.0054, median<0.0020
q4_0::layers.11.attention.wo.weight : mse 0.00000327, maxerr 0.03372192, 95pct<0.0034, median<0.0016
q4_0::layers.11.attention.wq.weight : mse 0.00000759, maxerr 0.04904175, 95pct<0.0052, median<0.0022
q4_0::layers.11.attention.wv.weight : mse 0.00000332, maxerr 0.01550729, 95pct<0.0034, median<0.0016
q4_0::layers.11.feed_forward.w1.weight : mse 0.00000482, maxerr 0.02865601, 95pct<0.0040, median<0.0018
q4_0::layers.11.feed_forward.w2.weight : mse 0.00000446, maxerr 0.06465366, 95pct<0.0038, median<0.0018
q4_0::layers.11.feed_forward.w3.weight : mse 0.00000458, maxerr 0.02845764, 95pct<0.0038, median<0.0018
q4_0::layers.12.attention.wk.weight : mse 0.00000713, maxerr 0.02630615, 95pct<0.0052, median<0.0020
q4_0::layers.12.attention.wo.weight : mse 0.00000315, maxerr 0.02679443, 95pct<0.0032, median<0.0016
q4_0::layers.12.attention.wq.weight : mse 0.00000690, maxerr 0.04470825, 95pct<0.0052, median<0.0020
q4_0::layers.12.attention.wv.weight : mse 0.00000309, maxerr 0.00950840, 95pct<0.0032, median<0.0016
q4_0::layers.12.feed_forward.w1.weight : mse 0.00000487, maxerr 0.04248047, 95pct<0.0040, median<0.0018
q4_0::layers.12.feed_forward.w2.weight : mse 0.00000447, maxerr 0.06710379, 95pct<0.0038, median<0.0018
q4_0::layers.12.feed_forward.w3.weight : mse 0.00000464, maxerr 0.02052089, 95pct<0.0040, median<0.0018
q4_0::layers.13.attention.wk.weight : mse 0.00000680, maxerr 0.02060809, 95pct<0.0052, median<0.0020
q4_0::layers.13.attention.wo.weight : mse 0.00000339, maxerr 0.03969029, 95pct<0.0034, median<0.0016
q4_0::layers.13.attention.wq.weight : mse 0.00000658, maxerr 0.04226249, 95pct<0.0050, median<0.0020
q4_0::layers.13.attention.wv.weight : mse 0.00000339, maxerr 0.00988333, 95pct<0.0034, median<0.0016
q4_0::layers.13.feed_forward.w1.weight : mse 0.00000483, maxerr 0.02473232, 95pct<0.0040, median<0.0018
q4_0::layers.13.feed_forward.w2.weight : mse 0.00000454, maxerr 0.03540039, 95pct<0.0038, median<0.0018
q4_0::layers.13.feed_forward.w3.weight : mse 0.00000473, maxerr 0.02145386, 95pct<0.0040, median<0.0018
q4_0::layers.14.attention.wk.weight : mse 0.00000678, maxerr 0.02231925, 95pct<0.0050, median<0.0020
q4_0::layers.14.attention.wo.weight : mse 0.00000340, maxerr 0.03186035, 95pct<0.0034, median<0.0016
q4_0::layers.14.attention.wq.weight : mse 0.00000667, maxerr 0.04726301, 95pct<0.0050, median<0.0020
q4_0::layers.14.attention.wv.weight : mse 0.00000343, maxerr 0.01053074, 95pct<0.0034, median<0.0016
q4_0::layers.14.feed_forward.w1.weight : mse 0.00000482, maxerr 0.02782331, 95pct<0.0040, median<0.0018
q4_0::layers.14.feed_forward.w2.weight : mse 0.00000459, maxerr 0.06702532, 95pct<0.0038, median<0.0018
q4_0::layers.14.feed_forward.w3.weight : mse 0.00000476, maxerr 0.02725874, 95pct<0.0040, median<0.0018
q4_0::layers.15.attention.wk.weight : mse 0.00000692, maxerr 0.02291870, 95pct<0.0052, median<0.0020
q4_0::layers.15.attention.wo.weight : mse 0.00000342, maxerr 0.02814592, 95pct<0.0034, median<0.0016
q4_0::layers.15.attention.wq.weight : mse 0.00000667, maxerr 0.04160418, 95pct<0.0050, median<0.0020
q4_0::layers.15.attention.wv.weight : mse 0.00000344, maxerr 0.01073456, 95pct<0.0034, median<0.0016
q4_0::layers.15.feed_forward.w1.weight : mse 0.00000482, maxerr 0.02490234, 95pct<0.0040, median<0.0018
q4_0::layers.15.feed_forward.w2.weight : mse 0.00000459, maxerr 0.07189941, 95pct<0.0038, median<0.0018
q4_0::layers.15.feed_forward.w3.weight : mse 0.00000477, maxerr 0.02241516, 95pct<0.0040, median<0.0018
q4_0::layers.16.attention.wk.weight : mse 0.00000683, maxerr 0.02305603, 95pct<0.0050, median<0.0020
q4_0::layers.16.attention.wo.weight : mse 0.00000384, maxerr 0.04959106, 95pct<0.0036, median<0.0016
q4_0::layers.16.attention.wq.weight : mse 0.00000649, maxerr 0.04993547, 95pct<0.0048, median<0.0020
q4_0::layers.16.attention.wv.weight : mse 0.00000390, maxerr 0.01102993, 95pct<0.0036, median<0.0016
q4_0::layers.16.feed_forward.w1.weight : mse 0.00000489, maxerr 0.02763585, 95pct<0.0040, median<0.0018
q4_0::layers.16.feed_forward.w2.weight : mse 0.00000458, maxerr 0.07447161, 95pct<0.0038, median<0.0018
q4_0::layers.16.feed_forward.w3.weight : mse 0.00000474, maxerr 0.02688599, 95pct<0.0040, median<0.0018
q4_0::layers.17.attention.wk.weight : mse 0.00000648, maxerr 0.02310181, 95pct<0.0050, median<0.0020
q4_0::layers.17.attention.wo.weight : mse 0.00000394, maxerr 0.03461565, 95pct<0.0036, median<0.0016
q4_0::layers.17.attention.wq.weight : mse 0.00000622, maxerr 0.05617850, 95pct<0.0048, median<0.0020
q4_0::layers.17.attention.wv.weight : mse 0.00000395, maxerr 0.01605225, 95pct<0.0036, median<0.0016
q4_0::layers.17.feed_forward.w1.weight : mse 0.00000491, maxerr 0.02258301, 95pct<0.0040, median<0.0018
q4_0::layers.17.feed_forward.w2.weight : mse 0.00000462, maxerr 0.05793108, 95pct<0.0038, median<0.0018
q4_0::layers.17.feed_forward.w3.weight : mse 0.00000476, maxerr 0.03012085, 95pct<0.0040, median<0.0018
q4_0::layers.18.attention.wk.weight : mse 0.00000624, maxerr 0.02255685, 95pct<0.0048, median<0.0018
q4_0::layers.18.attention.wo.weight : mse 0.00000391, maxerr 0.03451974, 95pct<0.0036, median<0.0016
q4_0::layers.18.attention.wq.weight : mse 0.00000608, maxerr 0.04820906, 95pct<0.0048, median<0.0018
q4_0::layers.18.attention.wv.weight : mse 0.00000393, maxerr 0.00875637, 95pct<0.0036, median<0.0016
q4_0::layers.18.feed_forward.w1.weight : mse 0.00000499, maxerr 0.02549744, 95pct<0.0040, median<0.0018
q4_0::layers.18.feed_forward.w2.weight : mse 0.00000460, maxerr 0.07299805, 95pct<0.0038, median<0.0018
q4_0::layers.18.feed_forward.w3.weight : mse 0.00000473, maxerr 0.02032471, 95pct<0.0040, median<0.0018
q4_0::layers.19.attention.wk.weight : mse 0.00000602, maxerr 0.02365112, 95pct<0.0048, median<0.0018
q4_0::layers.19.attention.wo.weight : mse 0.00000425, maxerr 0.03796387, 95pct<0.0038, median<0.0018
q4_0::layers.19.attention.wq.weight : mse 0.00000587, maxerr 0.05416870, 95pct<0.0046, median<0.0018
q4_0::layers.19.attention.wv.weight : mse 0.00000432, maxerr 0.01004791, 95pct<0.0038, median<0.0018
q4_0::layers.19.feed_forward.w1.weight : mse 0.00000504, maxerr 0.03585815, 95pct<0.0040, median<0.0018
q4_0::layers.19.feed_forward.w2.weight : mse 0.00000461, maxerr 0.04916382, 95pct<0.0038, median<0.0018
q4_0::layers.19.feed_forward.w3.weight : mse 0.00000471, maxerr 0.02484567, 95pct<0.0040, median<0.0018
q4_0::layers.2.attention.wk.weight : mse 0.00001387, maxerr 0.03720093, 95pct<0.0076, median<0.0024
q4_0::layers.2.attention.wo.weight : mse 0.00000191, maxerr 0.04980469, 95pct<0.0026, median<0.0012
q4_0::layers.2.attention.wq.weight : mse 0.00001285, maxerr 0.04199655, 95pct<0.0072, median<0.0026
q4_0::layers.2.attention.wv.weight : mse 0.00000184, maxerr 0.01274654, 95pct<0.0026, median<0.0012
q4_0::layers.2.feed_forward.w1.weight : mse 0.00000486, maxerr 0.04729353, 95pct<0.0040, median<0.0018
q4_0::layers.2.feed_forward.w2.weight : mse 0.00000418, maxerr 0.09649658, 95pct<0.0036, median<0.0018
q4_0::layers.2.feed_forward.w3.weight : mse 0.00000404, maxerr 0.03546143, 95pct<0.0036, median<0.0018
q4_0::layers.20.attention.wk.weight : mse 0.00000620, maxerr 0.02835519, 95pct<0.0048, median<0.0020
q4_0::layers.20.attention.wo.weight : mse 0.00000447, maxerr 0.03102329, 95pct<0.0038, median<0.0018
q4_0::layers.20.attention.wq.weight : mse 0.00000603, maxerr 0.06380789, 95pct<0.0046, median<0.0020
q4_0::layers.20.attention.wv.weight : mse 0.00000459, maxerr 0.01255689, 95pct<0.0040, median<0.0018
q4_0::layers.20.feed_forward.w1.weight : mse 0.00000510, maxerr 0.03062439, 95pct<0.0042, median<0.0018
q4_0::layers.20.feed_forward.w2.weight : mse 0.00000462, maxerr 0.07861328, 95pct<0.0038, median<0.0018
q4_0::layers.20.feed_forward.w3.weight : mse 0.00000471, maxerr 0.02113124, 95pct<0.0040, median<0.0018
q4_0::layers.21.attention.wk.weight : mse 0.00000576, maxerr 0.02944946, 95pct<0.0046, median<0.0018
q4_0::layers.21.attention.wo.weight : mse 0.00000452, maxerr 0.05029297, 95pct<0.0038, median<0.0018
q4_0::layers.21.attention.wq.weight : mse 0.00000565, maxerr 0.05474854, 95pct<0.0046, median<0.0018
q4_0::layers.21.attention.wv.weight : mse 0.00000465, maxerr 0.00916617, 95pct<0.0040, median<0.0018
q4_0::layers.21.feed_forward.w1.weight : mse 0.00000515, maxerr 0.02553013, 95pct<0.0042, median<0.0020
q4_0::layers.21.feed_forward.w2.weight : mse 0.00000461, maxerr 0.04321289, 95pct<0.0038, median<0.0018
q4_0::layers.21.feed_forward.w3.weight : mse 0.00000470, maxerr 0.01762608, 95pct<0.0040, median<0.0018
q4_0::layers.22.attention.wk.weight : mse 0.00000593, maxerr 0.02319990, 95pct<0.0046, median<0.0018
q4_0::layers.22.attention.wo.weight : mse 0.00000455, maxerr 0.05803570, 95pct<0.0038, median<0.0018
q4_0::layers.22.attention.wq.weight : mse 0.00000584, maxerr 0.05233765, 95pct<0.0046, median<0.0018
q4_0::layers.22.attention.wv.weight : mse 0.00000460, maxerr 0.00913565, 95pct<0.0040, median<0.0018
q4_0::layers.22.feed_forward.w1.weight : mse 0.00000516, maxerr 0.03031921, 95pct<0.0042, median<0.0020
q4_0::layers.22.feed_forward.w2.weight : mse 0.00000465, maxerr 0.05004447, 95pct<0.0040, median<0.0018
q4_0::layers.22.feed_forward.w3.weight : mse 0.00000474, maxerr 0.03417097, 95pct<0.0040, median<0.0018
q4_0::layers.23.attention.wk.weight : mse 0.00000553, maxerr 0.02395194, 95pct<0.0046, median<0.0018
q4_0::layers.23.attention.wo.weight : mse 0.00000480, maxerr 0.05573380, 95pct<0.0040, median<0.0018
q4_0::layers.23.attention.wq.weight : mse 0.00000549, maxerr 0.04861450, 95pct<0.0046, median<0.0018
q4_0::layers.23.attention.wv.weight : mse 0.00000497, maxerr 0.01011222, 95pct<0.0040, median<0.0018
q4_0::layers.23.feed_forward.w1.weight : mse 0.00000518, maxerr 0.04048811, 95pct<0.0042, median<0.0020
q4_0::layers.23.feed_forward.w2.weight : mse 0.00000469, maxerr 0.04928589, 95pct<0.0040, median<0.0018
q4_0::layers.23.feed_forward.w3.weight : mse 0.00000476, maxerr 0.02658953, 95pct<0.0040, median<0.0018
q4_0::layers.24.attention.wk.weight : mse 0.00000557, maxerr 0.02485657, 95pct<0.0046, median<0.0018
q4_0::layers.24.attention.wo.weight : mse 0.00000492, maxerr 0.03874861, 95pct<0.0040, median<0.0018
q4_0::layers.24.attention.wq.weight : mse 0.00000550, maxerr 0.05524989, 95pct<0.0044, median<0.0018
q4_0::layers.24.attention.wv.weight : mse 0.00000509, maxerr 0.00925663, 95pct<0.0042, median<0.0020
q4_0::layers.24.feed_forward.w1.weight : mse 0.00000519, maxerr 0.02595520, 95pct<0.0042, median<0.0020
q4_0::layers.24.feed_forward.w2.weight : mse 0.00000473, maxerr 0.07434082, 95pct<0.0040, median<0.0018
q4_0::layers.24.feed_forward.w3.weight : mse 0.00000481, maxerr 0.02382333, 95pct<0.0040, median<0.0018
q4_0::layers.25.attention.wk.weight : mse 0.00000593, maxerr 0.02381897, 95pct<0.0046, median<0.0020
q4_0::layers.25.attention.wo.weight : mse 0.00000501, maxerr 0.05418178, 95pct<0.0040, median<0.0018
q4_0::layers.25.attention.wq.weight : mse 0.00000580, maxerr 0.04802595, 95pct<0.0046, median<0.0018
q4_0::layers.25.attention.wv.weight : mse 0.00000513, maxerr 0.00928388, 95pct<0.0042, median<0.0020
q4_0::layers.25.feed_forward.w1.weight : mse 0.00000522, maxerr 0.02363586, 95pct<0.0042, median<0.0020
q4_0::layers.25.feed_forward.w2.weight : mse 0.00000476, maxerr 0.04512678, 95pct<0.0040, median<0.0018
q4_0::layers.25.feed_forward.w3.weight : mse 0.00000484, maxerr 0.02100045, 95pct<0.0040, median<0.0018
q4_0::layers.26.attention.wk.weight : mse 0.00000573, maxerr 0.03003583, 95pct<0.0046, median<0.0018
q4_0::layers.26.attention.wo.weight : mse 0.00000531, maxerr 0.02968707, 95pct<0.0042, median<0.0020
q4_0::layers.26.attention.wq.weight : mse 0.00000562, maxerr 0.04742432, 95pct<0.0044, median<0.0018
q4_0::layers.26.attention.wv.weight : mse 0.00000544, maxerr 0.01335907, 95pct<0.0042, median<0.0020
q4_0::layers.26.feed_forward.w1.weight : mse 0.00000521, maxerr 0.04232788, 95pct<0.0042, median<0.0020
q4_0::layers.26.feed_forward.w2.weight : mse 0.00000482, maxerr 0.05311366, 95pct<0.0040, median<0.0018
q4_0::layers.26.feed_forward.w3.weight : mse 0.00000492, maxerr 0.03234427, 95pct<0.0040, median<0.0018
q4_0::layers.27.attention.wk.weight : mse 0.00000570, maxerr 0.02717590, 95pct<0.0046, median<0.0018
q4_0::layers.27.attention.wo.weight : mse 0.00000557, maxerr 0.06518555, 95pct<0.0042, median<0.0020
q4_0::layers.27.attention.wq.weight : mse 0.00000564, maxerr 0.04687936, 95pct<0.0044, median<0.0018
q4_0::layers.27.attention.wv.weight : mse 0.00000563, maxerr 0.01382010, 95pct<0.0044, median<0.0020
q4_0::layers.27.feed_forward.w1.weight : mse 0.00000521, maxerr 0.03359985, 95pct<0.0042, median<0.0020
q4_0::layers.27.feed_forward.w2.weight : mse 0.00000488, maxerr 0.05447388, 95pct<0.0040, median<0.0018
q4_0::layers.27.feed_forward.w3.weight : mse 0.00000496, maxerr 0.04153442, 95pct<0.0040, median<0.0018
q4_0::layers.28.attention.wk.weight : mse 0.00000545, maxerr 0.02662441, 95pct<0.0044, median<0.0018
q4_0::layers.28.attention.wo.weight : mse 0.00000569, maxerr 0.03404018, 95pct<0.0044, median<0.0020
q4_0::layers.28.attention.wq.weight : mse 0.00000541, maxerr 0.04815674, 95pct<0.0044, median<0.0018
q4_0::layers.28.attention.wv.weight : mse 0.00000569, maxerr 0.01067243, 95pct<0.0044, median<0.0020
q4_0::layers.28.feed_forward.w1.weight : mse 0.00000516, maxerr 0.03170776, 95pct<0.0042, median<0.0020
q4_0::layers.28.feed_forward.w2.weight : mse 0.00000493, maxerr 0.05703735, 95pct<0.0040, median<0.0018
q4_0::layers.28.feed_forward.w3.weight : mse 0.00000501, maxerr 0.03425816, 95pct<0.0040, median<0.0020
q4_0::layers.29.attention.wk.weight : mse 0.00000537, maxerr 0.02471052, 95pct<0.0044, median<0.0018
q4_0::layers.29.attention.wo.weight : mse 0.00000604, maxerr 0.04220146, 95pct<0.0044, median<0.0020
q4_0::layers.29.attention.wq.weight : mse 0.00000531, maxerr 0.04730660, 95pct<0.0044, median<0.0018
q4_0::layers.29.attention.wv.weight : mse 0.00000603, maxerr 0.01110731, 95pct<0.0044, median<0.0020
q4_0::layers.29.feed_forward.w1.weight : mse 0.00000519, maxerr 0.03314209, 95pct<0.0042, median<0.0020
q4_0::layers.29.feed_forward.w2.weight : mse 0.00000499, maxerr 0.09802246, 95pct<0.0040, median<0.0018
q4_0::layers.29.feed_forward.w3.weight : mse 0.00000507, maxerr 0.03025600, 95pct<0.0040, median<0.0020
q4_0::layers.3.attention.wk.weight : mse 0.00000954, maxerr 0.02493504, 95pct<0.0062, median<0.0022
q4_0::layers.3.attention.wo.weight : mse 0.00000257, maxerr 0.03826904, 95pct<0.0030, median<0.0014
q4_0::layers.3.attention.wq.weight : mse 0.00000872, maxerr 0.05447824, 95pct<0.0056, median<0.0022
q4_0::layers.3.attention.wv.weight : mse 0.00000258, maxerr 0.00874329, 95pct<0.0030, median<0.0014
q4_0::layers.3.feed_forward.w1.weight : mse 0.00000496, maxerr 0.03732300, 95pct<0.0040, median<0.0018
q4_0::layers.3.feed_forward.w2.weight : mse 0.00000422, maxerr 0.06250000, 95pct<0.0038, median<0.0018
q4_0::layers.3.feed_forward.w3.weight : mse 0.00000420, maxerr 0.02593994, 95pct<0.0038, median<0.0018
q4_0::layers.30.attention.wk.weight : mse 0.00000549, maxerr 0.02535576, 95pct<0.0044, median<0.0018
q4_0::layers.30.attention.wo.weight : mse 0.00000602, maxerr 0.06033325, 95pct<0.0044, median<0.0020
q4_0::layers.30.attention.wq.weight : mse 0.00000545, maxerr 0.04311262, 95pct<0.0044, median<0.0018
q4_0::layers.30.attention.wv.weight : mse 0.00000588, maxerr 0.01166643, 95pct<0.0044, median<0.0020
q4_0::layers.30.feed_forward.w1.weight : mse 0.00000526, maxerr 0.02958679, 95pct<0.0042, median<0.0020
q4_0::layers.30.feed_forward.w2.weight : mse 0.00000520, maxerr 0.14257812, 95pct<0.0040, median<0.0018
q4_0::layers.30.feed_forward.w3.weight : mse 0.00000517, maxerr 0.04168701, 95pct<0.0042, median<0.0020
q4_0::layers.31.attention.wk.weight : mse 0.00000586, maxerr 0.02397810, 95pct<0.0046, median<0.0018
q4_0::layers.31.attention.wo.weight : mse 0.00000490, maxerr 0.11397886, 95pct<0.0040, median<0.0018
q4_0::layers.31.attention.wq.weight : mse 0.00000561, maxerr 0.03053502, 95pct<0.0044, median<0.0018
q4_0::layers.31.attention.wv.weight : mse 0.00000479, maxerr 0.01826041, 95pct<0.0040, median<0.0018
q4_0::layers.31.feed_forward.w1.weight : mse 0.00000574, maxerr 0.03063965, 95pct<0.0044, median<0.0020
q4_0::layers.31.feed_forward.w2.weight : mse 0.00000529, maxerr 0.11260986, 95pct<0.0042, median<0.0020
q4_0::layers.31.feed_forward.w3.weight : mse 0.00000562, maxerr 0.04486084, 95pct<0.0044, median<0.0020
q4_0::layers.4.attention.wk.weight : mse 0.00000918, maxerr 0.02430725, 95pct<0.0060, median<0.0022
q4_0::layers.4.attention.wo.weight : mse 0.00000257, maxerr 0.03571430, 95pct<0.0030, median<0.0014
q4_0::layers.4.attention.wq.weight : mse 0.00000902, maxerr 0.05325317, 95pct<0.0058, median<0.0022
q4_0::layers.4.attention.wv.weight : mse 0.00000258, maxerr 0.01036835, 95pct<0.0030, median<0.0014
q4_0::layers.4.feed_forward.w1.weight : mse 0.00000509, maxerr 0.04565430, 95pct<0.0040, median<0.0018
q4_0::layers.4.feed_forward.w2.weight : mse 0.00000419, maxerr 0.06991141, 95pct<0.0038, median<0.0018
q4_0::layers.4.feed_forward.w3.weight : mse 0.00000423, maxerr 0.03997803, 95pct<0.0038, median<0.0018
q4_0::layers.5.attention.wk.weight : mse 0.00000817, maxerr 0.03778076, 95pct<0.0056, median<0.0020
q4_0::layers.5.attention.wo.weight : mse 0.00000264, maxerr 0.04791260, 95pct<0.0030, median<0.0014
q4_0::layers.5.attention.wq.weight : mse 0.00000805, maxerr 0.04947335, 95pct<0.0054, median<0.0022
q4_0::layers.5.attention.wv.weight : mse 0.00000267, maxerr 0.01681519, 95pct<0.0030, median<0.0014
q4_0::layers.5.feed_forward.w1.weight : mse 0.00000530, maxerr 0.03759766, 95pct<0.0042, median<0.0020
q4_0::layers.5.feed_forward.w2.weight : mse 0.00000411, maxerr 0.05007935, 95pct<0.0036, median<0.0018
q4_0::layers.5.feed_forward.w3.weight : mse 0.00000419, maxerr 0.02728271, 95pct<0.0038, median<0.0018
q4_0::layers.6.attention.wk.weight : mse 0.00000854, maxerr 0.02463859, 95pct<0.0058, median<0.0022
q4_0::layers.6.attention.wo.weight : mse 0.00000269, maxerr 0.04042272, 95pct<0.0030, median<0.0014
q4_0::layers.6.attention.wq.weight : mse 0.00000817, maxerr 0.06672886, 95pct<0.0056, median<0.0022
q4_0::layers.6.attention.wv.weight : mse 0.00000271, maxerr 0.00993238, 95pct<0.0030, median<0.0014
q4_0::layers.6.feed_forward.w1.weight : mse 0.00000516, maxerr 0.04476929, 95pct<0.0042, median<0.0020
q4_0::layers.6.feed_forward.w2.weight : mse 0.00000419, maxerr 0.06134033, 95pct<0.0038, median<0.0018
q4_0::layers.6.feed_forward.w3.weight : mse 0.00000430, maxerr 0.02858843, 95pct<0.0038, median<0.0018
q4_0::layers.7.attention.wk.weight : mse 0.00000803, maxerr 0.02537537, 95pct<0.0056, median<0.0020
q4_0::layers.7.attention.wo.weight : mse 0.00000281, maxerr 0.03941128, 95pct<0.0030, median<0.0014
q4_0::layers.7.attention.wq.weight : mse 0.00000790, maxerr 0.05334473, 95pct<0.0054, median<0.0020
q4_0::layers.7.attention.wv.weight : mse 0.00000288, maxerr 0.01028442, 95pct<0.0032, median<0.0014
q4_0::layers.7.feed_forward.w1.weight : mse 0.00000506, maxerr 0.03002494, 95pct<0.0042, median<0.0018
q4_0::layers.7.feed_forward.w2.weight : mse 0.00000423, maxerr 0.04916382, 95pct<0.0038, median<0.0018
q4_0::layers.7.feed_forward.w3.weight : mse 0.00000433, maxerr 0.03115845, 95pct<0.0038, median<0.0018
q4_0::layers.8.attention.wk.weight : mse 0.00000764, maxerr 0.02466692, 95pct<0.0054, median<0.0020
q4_0::layers.8.attention.wo.weight : mse 0.00000278, maxerr 0.03404018, 95pct<0.0030, median<0.0014
q4_0::layers.8.attention.wq.weight : mse 0.00000764, maxerr 0.04733276, 95pct<0.0054, median<0.0020
q4_0::layers.8.attention.wv.weight : mse 0.00000282, maxerr 0.01240540, 95pct<0.0032, median<0.0014
q4_0::layers.8.feed_forward.w1.weight : mse 0.00000506, maxerr 0.03372192, 95pct<0.0042, median<0.0018
q4_0::layers.8.feed_forward.w2.weight : mse 0.00000423, maxerr 0.04331752, 95pct<0.0038, median<0.0018
q4_0::layers.8.feed_forward.w3.weight : mse 0.00000436, maxerr 0.02461243, 95pct<0.0038, median<0.0018
q4_0::layers.9.attention.wk.weight : mse 0.00000723, maxerr 0.02087402, 95pct<0.0054, median<0.0020
q4_0::layers.9.attention.wo.weight : mse 0.00000274, maxerr 0.03878348, 95pct<0.0030, median<0.0014
q4_0::layers.9.attention.wq.weight : mse 0.00000715, maxerr 0.04925537, 95pct<0.0052, median<0.0020
q4_0::layers.9.attention.wv.weight : mse 0.00000278, maxerr 0.00997925, 95pct<0.0030, median<0.0014
q4_0::layers.9.feed_forward.w1.weight : mse 0.00000492, maxerr 0.04805647, 95pct<0.0040, median<0.0018
q4_0::layers.9.feed_forward.w2.weight : mse 0.00000430, maxerr 0.04580688, 95pct<0.0038, median<0.0018
q4_0::layers.9.feed_forward.w3.weight : mse 0.00000442, maxerr 0.04849243, 95pct<0.0038, median<0.0018
q4_0::output.weight : mse 0.00000394, maxerr 0.02912903, 95pct<0.0036, median<0.0016
q4_0::tok_embeddings.weight : mse 0.00000385, maxerr 0.01875523, 95pct<0.0036, median<0.0016
q4_0 : mse 0.00000492, maxerr 0.14257812, 95pct<0.0040, median<0.0018
q4_1::layers.0.attention.wk.weight : mse 0.00000684, maxerr 0.04107666, 95pct<0.0054, median<0.0016
q4_1::layers.0.attention.wo.weight : mse 0.00000092, maxerr 0.02982175, 95pct<0.0020, median<0.0008
q4_1::layers.0.attention.wq.weight : mse 0.00000702, maxerr 0.02834333, 95pct<0.0056, median<0.0016
q4_1::layers.0.attention.wv.weight : mse 0.00000114, maxerr 0.00560303, 95pct<0.0022, median<0.0008
q4_1::layers.0.feed_forward.w1.weight : mse 0.00000175, maxerr 0.03655243, 95pct<0.0024, median<0.0012
q4_1::layers.0.feed_forward.w2.weight : mse 0.00000259, maxerr 0.04300943, 95pct<0.0030, median<0.0014
q4_1::layers.0.feed_forward.w3.weight : mse 0.00000165, maxerr 0.01006266, 95pct<0.0024, median<0.0012
q4_1::layers.1.attention.wk.weight : mse 0.00000728, maxerr 0.02334900, 95pct<0.0058, median<0.0016
q4_1::layers.1.attention.wo.weight : mse 0.00000086, maxerr 0.03453889, 95pct<0.0018, median<0.0008
q4_1::layers.1.attention.wq.weight : mse 0.00000700, maxerr 0.01987410, 95pct<0.0056, median<0.0016
q4_1::layers.1.attention.wv.weight : mse 0.00000083, maxerr 0.00478211, 95pct<0.0018, median<0.0008
q4_1::layers.1.feed_forward.w1.weight : mse 0.00000283, maxerr 0.02051294, 95pct<0.0030, median<0.0014
q4_1::layers.1.feed_forward.w2.weight : mse 0.00000272, maxerr 0.03843182, 95pct<0.0030, median<0.0014
q4_1::layers.1.feed_forward.w3.weight : mse 0.00000255, maxerr 0.01320738, 95pct<0.0030, median<0.0014
q4_1::layers.10.attention.wk.weight : mse 0.00000472, maxerr 0.01563987, 95pct<0.0044, median<0.0016
q4_1::layers.10.attention.wo.weight : mse 0.00000193, maxerr 0.02667642, 95pct<0.0026, median<0.0012
q4_1::layers.10.attention.wq.weight : mse 0.00000462, maxerr 0.02052003, 95pct<0.0042, median<0.0016
q4_1::layers.10.attention.wv.weight : mse 0.00000194, maxerr 0.00943857, 95pct<0.0026, median<0.0012
q4_1::layers.10.feed_forward.w1.weight : mse 0.00000314, maxerr 0.01556396, 95pct<0.0032, median<0.0014
q4_1::layers.10.feed_forward.w2.weight : mse 0.00000283, maxerr 0.02537537, 95pct<0.0030, median<0.0014
q4_1::layers.10.feed_forward.w3.weight : mse 0.00000294, maxerr 0.01292909, 95pct<0.0032, median<0.0014
q4_1::layers.11.attention.wk.weight : mse 0.00000505, maxerr 0.01603444, 95pct<0.0044, median<0.0016
q4_1::layers.11.attention.wo.weight : mse 0.00000212, maxerr 0.02708334, 95pct<0.0028, median<0.0012
q4_1::layers.11.attention.wq.weight : mse 0.00000490, maxerr 0.02761781, 95pct<0.0042, median<0.0016
q4_1::layers.11.attention.wv.weight : mse 0.00000215, maxerr 0.00829771, 95pct<0.0028, median<0.0012
q4_1::layers.11.feed_forward.w1.weight : mse 0.00000313, maxerr 0.01594034, 95pct<0.0032, median<0.0014
q4_1::layers.11.feed_forward.w2.weight : mse 0.00000288, maxerr 0.03227139, 95pct<0.0032, median<0.0014
q4_1::layers.11.feed_forward.w3.weight : mse 0.00000297, maxerr 0.01712444, 95pct<0.0032, median<0.0014
q4_1::layers.12.attention.wk.weight : mse 0.00000460, maxerr 0.01596579, 95pct<0.0042, median<0.0016
q4_1::layers.12.attention.wo.weight : mse 0.00000205, maxerr 0.02017212, 95pct<0.0026, median<0.0012
q4_1::layers.12.attention.wq.weight : mse 0.00000446, maxerr 0.02285360, 95pct<0.0042, median<0.0016
q4_1::layers.12.attention.wv.weight : mse 0.00000200, maxerr 0.00573961, 95pct<0.0026, median<0.0012
q4_1::layers.12.feed_forward.w1.weight : mse 0.00000316, maxerr 0.02284165, 95pct<0.0032, median<0.0014
q4_1::layers.12.feed_forward.w2.weight : mse 0.00000289, maxerr 0.03540853, 95pct<0.0032, median<0.0014
q4_1::layers.12.feed_forward.w3.weight : mse 0.00000301, maxerr 0.01079203, 95pct<0.0032, median<0.0014
q4_1::layers.13.attention.wk.weight : mse 0.00000439, maxerr 0.01437837, 95pct<0.0042, median<0.0016
q4_1::layers.13.attention.wo.weight : mse 0.00000220, maxerr 0.03025717, 95pct<0.0028, median<0.0012
q4_1::layers.13.attention.wq.weight : mse 0.00000425, maxerr 0.02128702, 95pct<0.0040, median<0.0016
q4_1::layers.13.attention.wv.weight : mse 0.00000219, maxerr 0.00622152, 95pct<0.0028, median<0.0012
q4_1::layers.13.feed_forward.w1.weight : mse 0.00000313, maxerr 0.01500246, 95pct<0.0032, median<0.0014
q4_1::layers.13.feed_forward.w2.weight : mse 0.00000294, maxerr 0.02633134, 95pct<0.0032, median<0.0014
q4_1::layers.13.feed_forward.w3.weight : mse 0.00000307, maxerr 0.01099548, 95pct<0.0032, median<0.0014
q4_1::layers.14.attention.wk.weight : mse 0.00000438, maxerr 0.01511636, 95pct<0.0042, median<0.0016
q4_1::layers.14.attention.wo.weight : mse 0.00000221, maxerr 0.02438152, 95pct<0.0028, median<0.0012
q4_1::layers.14.attention.wq.weight : mse 0.00000431, maxerr 0.02371013, 95pct<0.0040, median<0.0016
q4_1::layers.14.attention.wv.weight : mse 0.00000222, maxerr 0.00633850, 95pct<0.0028, median<0.0012
q4_1::layers.14.feed_forward.w1.weight : mse 0.00000313, maxerr 0.01666871, 95pct<0.0032, median<0.0014
q4_1::layers.14.feed_forward.w2.weight : mse 0.00000297, maxerr 0.03455403, 95pct<0.0032, median<0.0014
q4_1::layers.14.feed_forward.w3.weight : mse 0.00000309, maxerr 0.01462197, 95pct<0.0032, median<0.0016
q4_1::layers.15.attention.wk.weight : mse 0.00000446, maxerr 0.01495159, 95pct<0.0042, median<0.0016
q4_1::layers.15.attention.wo.weight : mse 0.00000222, maxerr 0.02541506, 95pct<0.0028, median<0.0012
q4_1::layers.15.attention.wq.weight : mse 0.00000431, maxerr 0.02229919, 95pct<0.0040, median<0.0016
q4_1::layers.15.attention.wv.weight : mse 0.00000223, maxerr 0.00649338, 95pct<0.0028, median<0.0012
q4_1::layers.15.feed_forward.w1.weight : mse 0.00000313, maxerr 0.01446533, 95pct<0.0032, median<0.0014
q4_1::layers.15.feed_forward.w2.weight : mse 0.00000297, maxerr 0.04414570, 95pct<0.0032, median<0.0014
q4_1::layers.15.feed_forward.w3.weight : mse 0.00000309, maxerr 0.01306508, 95pct<0.0032, median<0.0016
q4_1::layers.16.attention.wk.weight : mse 0.00000441, maxerr 0.01464437, 95pct<0.0040, median<0.0016
q4_1::layers.16.attention.wo.weight : mse 0.00000250, maxerr 0.04169917, 95pct<0.0030, median<0.0014
q4_1::layers.16.attention.wq.weight : mse 0.00000419, maxerr 0.02543133, 95pct<0.0040, median<0.0016
q4_1::layers.16.attention.wv.weight : mse 0.00000253, maxerr 0.00660706, 95pct<0.0030, median<0.0014
q4_1::layers.16.feed_forward.w1.weight : mse 0.00000317, maxerr 0.01479188, 95pct<0.0032, median<0.0014
q4_1::layers.16.feed_forward.w2.weight : mse 0.00000297, maxerr 0.03672282, 95pct<0.0032, median<0.0014
q4_1::layers.16.feed_forward.w3.weight : mse 0.00000307, maxerr 0.01314189, 95pct<0.0032, median<0.0014
q4_1::layers.17.attention.wk.weight : mse 0.00000418, maxerr 0.01311338, 95pct<0.0040, median<0.0016
q4_1::layers.17.attention.wo.weight : mse 0.00000256, maxerr 0.02875367, 95pct<0.0030, median<0.0014
q4_1::layers.17.attention.wq.weight : mse 0.00000401, maxerr 0.03082275, 95pct<0.0038, median<0.0016
q4_1::layers.17.attention.wv.weight : mse 0.00000256, maxerr 0.00813599, 95pct<0.0030, median<0.0014
q4_1::layers.17.feed_forward.w1.weight : mse 0.00000319, maxerr 0.01236165, 95pct<0.0032, median<0.0016
q4_1::layers.17.feed_forward.w2.weight : mse 0.00000299, maxerr 0.02805888, 95pct<0.0032, median<0.0014
q4_1::layers.17.feed_forward.w3.weight : mse 0.00000309, maxerr 0.01775716, 95pct<0.0032, median<0.0016
q4_1::layers.18.attention.wk.weight : mse 0.00000403, maxerr 0.01377869, 95pct<0.0040, median<0.0014
q4_1::layers.18.attention.wo.weight : mse 0.00000254, maxerr 0.03247070, 95pct<0.0030, median<0.0014
q4_1::layers.18.attention.wq.weight : mse 0.00000392, maxerr 0.02439576, 95pct<0.0038, median<0.0014
q4_1::layers.18.attention.wv.weight : mse 0.00000255, maxerr 0.00598729, 95pct<0.0030, median<0.0014
q4_1::layers.18.feed_forward.w1.weight : mse 0.00000324, maxerr 0.01477051, 95pct<0.0034, median<0.0016
q4_1::layers.18.feed_forward.w2.weight : mse 0.00000298, maxerr 0.03860271, 95pct<0.0032, median<0.0014
q4_1::layers.18.feed_forward.w3.weight : mse 0.00000307, maxerr 0.01042479, 95pct<0.0032, median<0.0014
q4_1::layers.19.attention.wk.weight : mse 0.00000388, maxerr 0.01365611, 95pct<0.0038, median<0.0014
q4_1::layers.19.attention.wo.weight : mse 0.00000276, maxerr 0.03216144, 95pct<0.0030, median<0.0014
q4_1::layers.19.attention.wq.weight : mse 0.00000378, maxerr 0.02803510, 95pct<0.0038, median<0.0014
q4_1::layers.19.attention.wv.weight : mse 0.00000280, maxerr 0.00700684, 95pct<0.0030, median<0.0014
q4_1::layers.19.feed_forward.w1.weight : mse 0.00000328, maxerr 0.01841432, 95pct<0.0034, median<0.0016
q4_1::layers.19.feed_forward.w2.weight : mse 0.00000299, maxerr 0.02656788, 95pct<0.0032, median<0.0014
q4_1::layers.19.feed_forward.w3.weight : mse 0.00000306, maxerr 0.01330259, 95pct<0.0032, median<0.0014
q4_1::layers.2.attention.wk.weight : mse 0.00000883, maxerr 0.01976573, 95pct<0.0062, median<0.0020
q4_1::layers.2.attention.wo.weight : mse 0.00000123, maxerr 0.03828126, 95pct<0.0020, median<0.0010
q4_1::layers.2.attention.wq.weight : mse 0.00000823, maxerr 0.02216390, 95pct<0.0058, median<0.0020
q4_1::layers.2.attention.wv.weight : mse 0.00000119, maxerr 0.00736135, 95pct<0.0020, median<0.0010
q4_1::layers.2.feed_forward.w1.weight : mse 0.00000315, maxerr 0.03544718, 95pct<0.0032, median<0.0016
q4_1::layers.2.feed_forward.w2.weight : mse 0.00000271, maxerr 0.05198061, 95pct<0.0030, median<0.0014
q4_1::layers.2.feed_forward.w3.weight : mse 0.00000262, maxerr 0.01909560, 95pct<0.0030, median<0.0014
q4_1::layers.20.attention.wk.weight : mse 0.00000400, maxerr 0.01639201, 95pct<0.0038, median<0.0016
q4_1::layers.20.attention.wo.weight : mse 0.00000290, maxerr 0.02312827, 95pct<0.0032, median<0.0014
q4_1::layers.20.attention.wq.weight : mse 0.00000388, maxerr 0.03564453, 95pct<0.0038, median<0.0016
q4_1::layers.20.attention.wv.weight : mse 0.00000298, maxerr 0.00713094, 95pct<0.0032, median<0.0014
q4_1::layers.20.feed_forward.w1.weight : mse 0.00000331, maxerr 0.01476848, 95pct<0.0034, median<0.0016
q4_1::layers.20.feed_forward.w2.weight : mse 0.00000300, maxerr 0.04094645, 95pct<0.0032, median<0.0014
q4_1::layers.20.feed_forward.w3.weight : mse 0.00000306, maxerr 0.01144791, 95pct<0.0032, median<0.0014
q4_1::layers.21.attention.wk.weight : mse 0.00000371, maxerr 0.01407850, 95pct<0.0038, median<0.0014
q4_1::layers.21.attention.wo.weight : mse 0.00000294, maxerr 0.04772949, 95pct<0.0032, median<0.0014
q4_1::layers.21.attention.wq.weight : mse 0.00000363, maxerr 0.02847900, 95pct<0.0038, median<0.0014
q4_1::layers.21.attention.wv.weight : mse 0.00000302, maxerr 0.00687256, 95pct<0.0032, median<0.0014
q4_1::layers.21.feed_forward.w1.weight : mse 0.00000334, maxerr 0.01481831, 95pct<0.0034, median<0.0016
q4_1::layers.21.feed_forward.w2.weight : mse 0.00000299, maxerr 0.02454491, 95pct<0.0032, median<0.0014
q4_1::layers.21.feed_forward.w3.weight : mse 0.00000305, maxerr 0.00941722, 95pct<0.0032, median<0.0014
q4_1::layers.22.attention.wk.weight : mse 0.00000382, maxerr 0.01457518, 95pct<0.0038, median<0.0016
q4_1::layers.22.attention.wo.weight : mse 0.00000296, maxerr 0.05219725, 95pct<0.0032, median<0.0014
q4_1::layers.22.attention.wq.weight : mse 0.00000375, maxerr 0.02614343, 95pct<0.0038, median<0.0016
q4_1::layers.22.attention.wv.weight : mse 0.00000298, maxerr 0.00633164, 95pct<0.0032, median<0.0014
q4_1::layers.22.feed_forward.w1.weight : mse 0.00000335, maxerr 0.01621208, 95pct<0.0034, median<0.0016
q4_1::layers.22.feed_forward.w2.weight : mse 0.00000302, maxerr 0.02524516, 95pct<0.0032, median<0.0014
q4_1::layers.22.feed_forward.w3.weight : mse 0.00000308, maxerr 0.01791126, 95pct<0.0032, median<0.0016
q4_1::layers.23.attention.wk.weight : mse 0.00000355, maxerr 0.01381835, 95pct<0.0038, median<0.0014
q4_1::layers.23.attention.wo.weight : mse 0.00000312, maxerr 0.05039060, 95pct<0.0032, median<0.0014
q4_1::layers.23.attention.wq.weight : mse 0.00000352, maxerr 0.02543131, 95pct<0.0036, median<0.0014
q4_1::layers.23.attention.wv.weight : mse 0.00000323, maxerr 0.00705466, 95pct<0.0034, median<0.0016
q4_1::layers.23.feed_forward.w1.weight : mse 0.00000336, maxerr 0.02019602, 95pct<0.0034, median<0.0016
q4_1::layers.23.feed_forward.w2.weight : mse 0.00000304, maxerr 0.02755737, 95pct<0.0032, median<0.0014
q4_1::layers.23.feed_forward.w3.weight : mse 0.00000309, maxerr 0.01500538, 95pct<0.0032, median<0.0016
q4_1::layers.24.attention.wk.weight : mse 0.00000358, maxerr 0.01357117, 95pct<0.0038, median<0.0014
q4_1::layers.24.attention.wo.weight : mse 0.00000320, maxerr 0.03517246, 95pct<0.0032, median<0.0016
q4_1::layers.24.attention.wq.weight : mse 0.00000353, maxerr 0.02697754, 95pct<0.0036, median<0.0014
q4_1::layers.24.attention.wv.weight : mse 0.00000330, maxerr 0.00695597, 95pct<0.0034, median<0.0016
q4_1::layers.24.feed_forward.w1.weight : mse 0.00000337, maxerr 0.01255596, 95pct<0.0034, median<0.0016
q4_1::layers.24.feed_forward.w2.weight : mse 0.00000307, maxerr 0.03697109, 95pct<0.0032, median<0.0016
q4_1::layers.24.feed_forward.w3.weight : mse 0.00000312, maxerr 0.01249239, 95pct<0.0032, median<0.0016
q4_1::layers.25.attention.wk.weight : mse 0.00000382, maxerr 0.01319379, 95pct<0.0038, median<0.0016
q4_1::layers.25.attention.wo.weight : mse 0.00000326, maxerr 0.03653157, 95pct<0.0034, median<0.0016
q4_1::layers.25.attention.wq.weight : mse 0.00000373, maxerr 0.02534175, 95pct<0.0036, median<0.0016
q4_1::layers.25.attention.wv.weight : mse 0.00000333, maxerr 0.00774231, 95pct<0.0034, median<0.0016
q4_1::layers.25.feed_forward.w1.weight : mse 0.00000339, maxerr 0.01365763, 95pct<0.0034, median<0.0016
q4_1::layers.25.feed_forward.w2.weight : mse 0.00000309, maxerr 0.02395630, 95pct<0.0032, median<0.0016
q4_1::layers.25.feed_forward.w3.weight : mse 0.00000315, maxerr 0.01177013, 95pct<0.0032, median<0.0016
q4_1::layers.26.attention.wk.weight : mse 0.00000370, maxerr 0.01424815, 95pct<0.0036, median<0.0016
q4_1::layers.26.attention.wo.weight : mse 0.00000345, maxerr 0.02384442, 95pct<0.0034, median<0.0016
q4_1::layers.26.attention.wq.weight : mse 0.00000361, maxerr 0.02352905, 95pct<0.0036, median<0.0016
q4_1::layers.26.attention.wv.weight : mse 0.00000353, maxerr 0.00762227, 95pct<0.0034, median<0.0016
q4_1::layers.26.feed_forward.w1.weight : mse 0.00000338, maxerr 0.02146912, 95pct<0.0034, median<0.0016
q4_1::layers.26.feed_forward.w2.weight : mse 0.00000313, maxerr 0.02818197, 95pct<0.0032, median<0.0016
q4_1::layers.26.feed_forward.w3.weight : mse 0.00000319, maxerr 0.02482224, 95pct<0.0032, median<0.0016
q4_1::layers.27.attention.wk.weight : mse 0.00000367, maxerr 0.01493329, 95pct<0.0036, median<0.0016
q4_1::layers.27.attention.wo.weight : mse 0.00000362, maxerr 0.05037433, 95pct<0.0034, median<0.0016
q4_1::layers.27.attention.wq.weight : mse 0.00000361, maxerr 0.02156782, 95pct<0.0036, median<0.0016
q4_1::layers.27.attention.wv.weight : mse 0.00000365, maxerr 0.00810165, 95pct<0.0036, median<0.0016
q4_1::layers.27.feed_forward.w1.weight : mse 0.00000338, maxerr 0.02540493, 95pct<0.0034, median<0.0016
q4_1::layers.27.feed_forward.w2.weight : mse 0.00000316, maxerr 0.02953517, 95pct<0.0032, median<0.0016
q4_1::layers.27.feed_forward.w3.weight : mse 0.00000322, maxerr 0.02640279, 95pct<0.0032, median<0.0016
q4_1::layers.28.attention.wk.weight : mse 0.00000351, maxerr 0.01595867, 95pct<0.0036, median<0.0014
q4_1::layers.28.attention.wo.weight : mse 0.00000370, maxerr 0.02981770, 95pct<0.0036, median<0.0016
q4_1::layers.28.attention.wq.weight : mse 0.00000347, maxerr 0.02494049, 95pct<0.0036, median<0.0014
q4_1::layers.28.attention.wv.weight : mse 0.00000369, maxerr 0.00791423, 95pct<0.0036, median<0.0016
q4_1::layers.28.feed_forward.w1.weight : mse 0.00000335, maxerr 0.02810669, 95pct<0.0034, median<0.0016
q4_1::layers.28.feed_forward.w2.weight : mse 0.00000319, maxerr 0.03309225, 95pct<0.0032, median<0.0016
q4_1::layers.28.feed_forward.w3.weight : mse 0.00000325, maxerr 0.02055053, 95pct<0.0032, median<0.0016
q4_1::layers.29.attention.wk.weight : mse 0.00000346, maxerr 0.01428223, 95pct<0.0036, median<0.0014
q4_1::layers.29.attention.wo.weight : mse 0.00000392, maxerr 0.03439641, 95pct<0.0036, median<0.0016
q4_1::layers.29.attention.wq.weight : mse 0.00000340, maxerr 0.02388712, 95pct<0.0036, median<0.0014
q4_1::layers.29.attention.wv.weight : mse 0.00000391, maxerr 0.00761922, 95pct<0.0036, median<0.0016
q4_1::layers.29.feed_forward.w1.weight : mse 0.00000337, maxerr 0.02038574, 95pct<0.0034, median<0.0016
q4_1::layers.29.feed_forward.w2.weight : mse 0.00000321, maxerr 0.05755107, 95pct<0.0032, median<0.0016
q4_1::layers.29.feed_forward.w3.weight : mse 0.00000329, maxerr 0.01542050, 95pct<0.0034, median<0.0016
q4_1::layers.3.attention.wk.weight : mse 0.00000611, maxerr 0.01627603, 95pct<0.0050, median<0.0018
q4_1::layers.3.attention.wo.weight : mse 0.00000167, maxerr 0.03495282, 95pct<0.0024, median<0.0012
q4_1::layers.3.attention.wq.weight : mse 0.00000552, maxerr 0.02804718, 95pct<0.0046, median<0.0018
q4_1::layers.3.attention.wv.weight : mse 0.00000167, maxerr 0.00555267, 95pct<0.0024, median<0.0012
q4_1::layers.3.feed_forward.w1.weight : mse 0.00000322, maxerr 0.02117920, 95pct<0.0032, median<0.0016
q4_1::layers.3.feed_forward.w2.weight : mse 0.00000273, maxerr 0.03491618, 95pct<0.0030, median<0.0014
q4_1::layers.3.feed_forward.w3.weight : mse 0.00000272, maxerr 0.01362103, 95pct<0.0030, median<0.0014
q4_1::layers.30.attention.wk.weight : mse 0.00000353, maxerr 0.01795453, 95pct<0.0036, median<0.0014
q4_1::layers.30.attention.wo.weight : mse 0.00000391, maxerr 0.04497075, 95pct<0.0036, median<0.0016
q4_1::layers.30.attention.wq.weight : mse 0.00000347, maxerr 0.02401968, 95pct<0.0036, median<0.0014
q4_1::layers.30.attention.wv.weight : mse 0.00000381, maxerr 0.00759277, 95pct<0.0036, median<0.0016
q4_1::layers.30.feed_forward.w1.weight : mse 0.00000341, maxerr 0.01782125, 95pct<0.0034, median<0.0016
q4_1::layers.30.feed_forward.w2.weight : mse 0.00000342, maxerr 0.12756348, 95pct<0.0032, median<0.0016
q4_1::layers.30.feed_forward.w3.weight : mse 0.00000335, maxerr 0.02418011, 95pct<0.0034, median<0.0016
q4_1::layers.31.attention.wk.weight : mse 0.00000375, maxerr 0.01362303, 95pct<0.0038, median<0.0016
q4_1::layers.31.attention.wo.weight : mse 0.00000319, maxerr 0.10161138, 95pct<0.0032, median<0.0014
q4_1::layers.31.attention.wq.weight : mse 0.00000357, maxerr 0.01820374, 95pct<0.0036, median<0.0016
q4_1::layers.31.attention.wv.weight : mse 0.00000310, maxerr 0.00997696, 95pct<0.0032, median<0.0014
q4_1::layers.31.feed_forward.w1.weight : mse 0.00000371, maxerr 0.02717184, 95pct<0.0036, median<0.0016
q4_1::layers.31.feed_forward.w2.weight : mse 0.00000336, maxerr 0.09575176, 95pct<0.0034, median<0.0016
q4_1::layers.31.feed_forward.w3.weight : mse 0.00000363, maxerr 0.03244019, 95pct<0.0034, median<0.0016
q4_1::layers.4.attention.wk.weight : mse 0.00000589, maxerr 0.01599121, 95pct<0.0048, median<0.0018
q4_1::layers.4.attention.wo.weight : mse 0.00000167, maxerr 0.02775061, 95pct<0.0024, median<0.0012
q4_1::layers.4.attention.wq.weight : mse 0.00000573, maxerr 0.02762246, 95pct<0.0046, median<0.0018
q4_1::layers.4.attention.wv.weight : mse 0.00000167, maxerr 0.00593816, 95pct<0.0024, median<0.0010
q4_1::layers.4.feed_forward.w1.weight : mse 0.00000330, maxerr 0.02607116, 95pct<0.0034, median<0.0016
q4_1::layers.4.feed_forward.w2.weight : mse 0.00000271, maxerr 0.04353638, 95pct<0.0030, median<0.0014
q4_1::layers.4.feed_forward.w3.weight : mse 0.00000274, maxerr 0.02257079, 95pct<0.0030, median<0.0014
q4_1::layers.5.attention.wk.weight : mse 0.00000527, maxerr 0.02016246, 95pct<0.0046, median<0.0016
q4_1::layers.5.attention.wo.weight : mse 0.00000171, maxerr 0.04142249, 95pct<0.0024, median<0.0012
q4_1::layers.5.attention.wq.weight : mse 0.00000516, maxerr 0.02691448, 95pct<0.0044, median<0.0016
q4_1::layers.5.attention.wv.weight : mse 0.00000173, maxerr 0.00809692, 95pct<0.0024, median<0.0012
q4_1::layers.5.feed_forward.w1.weight : mse 0.00000344, maxerr 0.02066040, 95pct<0.0034, median<0.0016
q4_1::layers.5.feed_forward.w2.weight : mse 0.00000266, maxerr 0.02678931, 95pct<0.0030, median<0.0014
q4_1::layers.5.feed_forward.w3.weight : mse 0.00000272, maxerr 0.01605021, 95pct<0.0030, median<0.0014
q4_1::layers.6.attention.wk.weight : mse 0.00000552, maxerr 0.01503804, 95pct<0.0046, median<0.0018
q4_1::layers.6.attention.wo.weight : mse 0.00000175, maxerr 0.03727213, 95pct<0.0024, median<0.0012
q4_1::layers.6.attention.wq.weight : mse 0.00000526, maxerr 0.03130698, 95pct<0.0044, median<0.0018
q4_1::layers.6.attention.wv.weight : mse 0.00000176, maxerr 0.00586955, 95pct<0.0024, median<0.0012
q4_1::layers.6.feed_forward.w1.weight : mse 0.00000334, maxerr 0.02278137, 95pct<0.0034, median<0.0016
q4_1::layers.6.feed_forward.w2.weight : mse 0.00000272, maxerr 0.03055978, 95pct<0.0030, median<0.0014
q4_1::layers.6.feed_forward.w3.weight : mse 0.00000279, maxerr 0.01386768, 95pct<0.0030, median<0.0014
q4_1::layers.7.attention.wk.weight : mse 0.00000518, maxerr 0.01637778, 95pct<0.0044, median<0.0016
q4_1::layers.7.attention.wo.weight : mse 0.00000182, maxerr 0.02817380, 95pct<0.0026, median<0.0012
q4_1::layers.7.attention.wq.weight : mse 0.00000509, maxerr 0.02885771, 95pct<0.0044, median<0.0016
q4_1::layers.7.attention.wv.weight : mse 0.00000187, maxerr 0.00640869, 95pct<0.0026, median<0.0012
q4_1::layers.7.feed_forward.w1.weight : mse 0.00000328, maxerr 0.01696777, 95pct<0.0034, median<0.0016
q4_1::layers.7.feed_forward.w2.weight : mse 0.00000274, maxerr 0.02849120, 95pct<0.0030, median<0.0014
q4_1::layers.7.feed_forward.w3.weight : mse 0.00000281, maxerr 0.01903725, 95pct<0.0030, median<0.0014
q4_1::layers.8.attention.wk.weight : mse 0.00000493, maxerr 0.01597899, 95pct<0.0044, median<0.0016
q4_1::layers.8.attention.wo.weight : mse 0.00000181, maxerr 0.02582398, 95pct<0.0026, median<0.0012
q4_1::layers.8.attention.wq.weight : mse 0.00000492, maxerr 0.02330780, 95pct<0.0044, median<0.0016
q4_1::layers.8.attention.wv.weight : mse 0.00000183, maxerr 0.00699462, 95pct<0.0026, median<0.0012
q4_1::layers.8.feed_forward.w1.weight : mse 0.00000328, maxerr 0.01851404, 95pct<0.0034, median<0.0016
q4_1::layers.8.feed_forward.w2.weight : mse 0.00000274, maxerr 0.02776897, 95pct<0.0030, median<0.0014
q4_1::layers.8.feed_forward.w3.weight : mse 0.00000283, maxerr 0.01309204, 95pct<0.0030, median<0.0014
q4_1::layers.9.attention.wk.weight : mse 0.00000468, maxerr 0.01326293, 95pct<0.0044, median<0.0016
q4_1::layers.9.attention.wo.weight : mse 0.00000178, maxerr 0.03066409, 95pct<0.0024, median<0.0012
q4_1::layers.9.attention.wq.weight : mse 0.00000461, maxerr 0.02470907, 95pct<0.0042, median<0.0016
q4_1::layers.9.attention.wv.weight : mse 0.00000180, maxerr 0.00619888, 95pct<0.0026, median<0.0012
q4_1::layers.9.feed_forward.w1.weight : mse 0.00000319, maxerr 0.02470452, 95pct<0.0034, median<0.0014
q4_1::layers.9.feed_forward.w2.weight : mse 0.00000278, maxerr 0.02815247, 95pct<0.0030, median<0.0014
q4_1::layers.9.feed_forward.w3.weight : mse 0.00000286, maxerr 0.02717841, 95pct<0.0032, median<0.0014
q4_1::output.weight : mse 0.00000251, maxerr 0.01462148, 95pct<0.0030, median<0.0014
q4_1::tok_embeddings.weight : mse 0.00000250, maxerr 0.01170197, 95pct<0.0030, median<0.0014
q4_1 : mse 0.00000318, maxerr 0.12756348, 95pct<0.0034, median<0.0014
quantize-stats after (7B)
note: source model is f16
testing 226 layers with max size 131072000, allocating 1572864000 bytes
q4_0::layers.0.attention.wk.weight : mse 0.00000946, maxerr 0.07012939, 95pct<0.0062, median<0.0018
q4_0::layers.0.attention.wo.weight : mse 0.00000114, maxerr 0.04718018, 95pct<0.0022, median<0.0008
q4_0::layers.0.attention.wq.weight : mse 0.00000990, maxerr 0.04913330, 95pct<0.0066, median<0.0018
q4_0::layers.0.attention.wv.weight : mse 0.00000138, maxerr 0.00809479, 95pct<0.0024, median<0.0010
q4_0::layers.0.feed_forward.w1.weight : mse 0.00000213, maxerr 0.06494141, 95pct<0.0026, median<0.0012
q4_0::layers.0.feed_forward.w2.weight : mse 0.00000315, maxerr 0.05014038, 95pct<0.0032, median<0.0016
q4_0::layers.0.feed_forward.w3.weight : mse 0.00000199, maxerr 0.01788330, 95pct<0.0026, median<0.0012
q4_0::layers.1.attention.wk.weight : mse 0.00000900, maxerr 0.04061890, 95pct<0.0064, median<0.0018
q4_0::layers.1.attention.wo.weight : mse 0.00000107, maxerr 0.06164551, 95pct<0.0020, median<0.0008
q4_0::layers.1.attention.wq.weight : mse 0.00000860, maxerr 0.03482056, 95pct<0.0062, median<0.0018
q4_0::layers.1.attention.wv.weight : mse 0.00000101, maxerr 0.00719452, 95pct<0.0020, median<0.0008
q4_0::layers.1.feed_forward.w1.weight : mse 0.00000343, maxerr 0.03784180, 95pct<0.0034, median<0.0016
q4_0::layers.1.feed_forward.w2.weight : mse 0.00000331, maxerr 0.05532837, 95pct<0.0032, median<0.0016
q4_0::layers.1.feed_forward.w3.weight : mse 0.00000309, maxerr 0.02270508, 95pct<0.0032, median<0.0016
q4_0::layers.10.attention.wk.weight : mse 0.00000574, maxerr 0.02407837, 95pct<0.0048, median<0.0018
q4_0::layers.10.attention.wo.weight : mse 0.00000233, maxerr 0.03552246, 95pct<0.0028, median<0.0014
q4_0::layers.10.attention.wq.weight : mse 0.00000561, maxerr 0.03735352, 95pct<0.0046, median<0.0018
q4_0::layers.10.attention.wv.weight : mse 0.00000235, maxerr 0.01551819, 95pct<0.0028, median<0.0014
q4_0::layers.10.feed_forward.w1.weight : mse 0.00000380, maxerr 0.02473450, 95pct<0.0036, median<0.0016
q4_0::layers.10.feed_forward.w2.weight : mse 0.00000344, maxerr 0.04382324, 95pct<0.0034, median<0.0016
q4_0::layers.10.feed_forward.w3.weight : mse 0.00000355, maxerr 0.02250671, 95pct<0.0034, median<0.0016
q4_0::layers.11.attention.wk.weight : mse 0.00000614, maxerr 0.02574158, 95pct<0.0048, median<0.0018
q4_0::layers.11.attention.wo.weight : mse 0.00000256, maxerr 0.03140259, 95pct<0.0030, median<0.0014
q4_0::layers.11.attention.wq.weight : mse 0.00000594, maxerr 0.04638672, 95pct<0.0046, median<0.0018
q4_0::layers.11.attention.wv.weight : mse 0.00000260, maxerr 0.01353455, 95pct<0.0030, median<0.0014
q4_0::layers.11.feed_forward.w1.weight : mse 0.00000378, maxerr 0.02500916, 95pct<0.0036, median<0.0016
q4_0::layers.11.feed_forward.w2.weight : mse 0.00000349, maxerr 0.05606079, 95pct<0.0034, median<0.0016
q4_0::layers.11.feed_forward.w3.weight : mse 0.00000359, maxerr 0.02583313, 95pct<0.0034, median<0.0016
q4_0::layers.12.attention.wk.weight : mse 0.00000559, maxerr 0.02272034, 95pct<0.0046, median<0.0018
q4_0::layers.12.attention.wo.weight : mse 0.00000247, maxerr 0.02410889, 95pct<0.0028, median<0.0014
q4_0::layers.12.attention.wq.weight : mse 0.00000541, maxerr 0.03787231, 95pct<0.0046, median<0.0018
q4_0::layers.12.attention.wv.weight : mse 0.00000243, maxerr 0.00985718, 95pct<0.0028, median<0.0014
q4_0::layers.12.feed_forward.w1.weight : mse 0.00000382, maxerr 0.03527832, 95pct<0.0036, median<0.0016
q4_0::layers.12.feed_forward.w2.weight : mse 0.00000350, maxerr 0.05621338, 95pct<0.0034, median<0.0016
q4_0::layers.12.feed_forward.w3.weight : mse 0.00000364, maxerr 0.01757812, 95pct<0.0034, median<0.0016
q4_0::layers.13.attention.wk.weight : mse 0.00000533, maxerr 0.02510071, 95pct<0.0046, median<0.0016
q4_0::layers.13.attention.wo.weight : mse 0.00000265, maxerr 0.04525757, 95pct<0.0030, median<0.0014
q4_0::layers.13.attention.wq.weight : mse 0.00000516, maxerr 0.03643799, 95pct<0.0044, median<0.0016
q4_0::layers.13.attention.wv.weight : mse 0.00000265, maxerr 0.01044464, 95pct<0.0030, median<0.0014
q4_0::layers.13.feed_forward.w1.weight : mse 0.00000379, maxerr 0.02149963, 95pct<0.0036, median<0.0016
q4_0::layers.13.feed_forward.w2.weight : mse 0.00000356, maxerr 0.03344727, 95pct<0.0034, median<0.0016
q4_0::layers.13.feed_forward.w3.weight : mse 0.00000371, maxerr 0.01843262, 95pct<0.0036, median<0.0016
q4_0::layers.14.attention.wk.weight : mse 0.00000532, maxerr 0.02272034, 95pct<0.0044, median<0.0018
q4_0::layers.14.attention.wo.weight : mse 0.00000267, maxerr 0.03359985, 95pct<0.0030, median<0.0014
q4_0::layers.14.attention.wq.weight : mse 0.00000523, maxerr 0.04104614, 95pct<0.0044, median<0.0018
q4_0::layers.14.attention.wv.weight : mse 0.00000269, maxerr 0.00988770, 95pct<0.0030, median<0.0014
q4_0::layers.14.feed_forward.w1.weight : mse 0.00000378, maxerr 0.02416992, 95pct<0.0036, median<0.0016
q4_0::layers.14.feed_forward.w2.weight : mse 0.00000360, maxerr 0.05963135, 95pct<0.0034, median<0.0016
q4_0::layers.14.feed_forward.w3.weight : mse 0.00000373, maxerr 0.02426147, 95pct<0.0036, median<0.0016
q4_0::layers.15.attention.wk.weight : mse 0.00000542, maxerr 0.02012634, 95pct<0.0046, median<0.0018
q4_0::layers.15.attention.wo.weight : mse 0.00000268, maxerr 0.02880859, 95pct<0.0030, median<0.0014
q4_0::layers.15.attention.wq.weight : mse 0.00000523, maxerr 0.03628540, 95pct<0.0044, median<0.0018
q4_0::layers.15.attention.wv.weight : mse 0.00000270, maxerr 0.00939941, 95pct<0.0030, median<0.0014
q4_0::layers.15.feed_forward.w1.weight : mse 0.00000378, maxerr 0.02140808, 95pct<0.0036, median<0.0016
q4_0::layers.15.feed_forward.w2.weight : mse 0.00000360, maxerr 0.06188965, 95pct<0.0034, median<0.0016
q4_0::layers.15.feed_forward.w3.weight : mse 0.00000374, maxerr 0.02241516, 95pct<0.0036, median<0.0016
q4_0::layers.16.attention.wk.weight : mse 0.00000535, maxerr 0.01998901, 95pct<0.0044, median<0.0018
q4_0::layers.16.attention.wo.weight : mse 0.00000301, maxerr 0.04467773, 95pct<0.0032, median<0.0014
q4_0::layers.16.attention.wq.weight : mse 0.00000509, maxerr 0.04324341, 95pct<0.0044, median<0.0018
q4_0::layers.16.attention.wv.weight : mse 0.00000306, maxerr 0.00990295, 95pct<0.0032, median<0.0014
q4_0::layers.16.feed_forward.w1.weight : mse 0.00000383, maxerr 0.02427673, 95pct<0.0036, median<0.0016
q4_0::layers.16.feed_forward.w2.weight : mse 0.00000359, maxerr 0.05804443, 95pct<0.0034, median<0.0016
q4_0::layers.16.feed_forward.w3.weight : mse 0.00000371, maxerr 0.02345276, 95pct<0.0036, median<0.0016
q4_0::layers.17.attention.wk.weight : mse 0.00000508, maxerr 0.01977539, 95pct<0.0044, median<0.0018
q4_0::layers.17.attention.wo.weight : mse 0.00000309, maxerr 0.02990723, 95pct<0.0032, median<0.0014
q4_0::layers.17.attention.wq.weight : mse 0.00000487, maxerr 0.04873657, 95pct<0.0042, median<0.0018
q4_0::layers.17.attention.wv.weight : mse 0.00000310, maxerr 0.01419830, 95pct<0.0032, median<0.0014
q4_0::layers.17.feed_forward.w1.weight : mse 0.00000385, maxerr 0.01963806, 95pct<0.0036, median<0.0016
q4_0::layers.17.feed_forward.w2.weight : mse 0.00000362, maxerr 0.04980469, 95pct<0.0034, median<0.0016
q4_0::layers.17.feed_forward.w3.weight : mse 0.00000373, maxerr 0.02487183, 95pct<0.0036, median<0.0016
q4_0::layers.18.attention.wk.weight : mse 0.00000489, maxerr 0.01959229, 95pct<0.0044, median<0.0016
q4_0::layers.18.attention.wo.weight : mse 0.00000307, maxerr 0.05526733, 95pct<0.0032, median<0.0014
q4_0::layers.18.attention.wq.weight : mse 0.00000477, maxerr 0.04193115, 95pct<0.0042, median<0.0016
q4_0::layers.18.attention.wv.weight : mse 0.00000308, maxerr 0.00922394, 95pct<0.0032, median<0.0014
q4_0::layers.18.feed_forward.w1.weight : mse 0.00000391, maxerr 0.02268982, 95pct<0.0036, median<0.0016
q4_0::layers.18.feed_forward.w2.weight : mse 0.00000360, maxerr 0.06439209, 95pct<0.0034, median<0.0016
q4_0::layers.18.feed_forward.w3.weight : mse 0.00000371, maxerr 0.01829529, 95pct<0.0036, median<0.0016
q4_0::layers.19.attention.wk.weight : mse 0.00000471, maxerr 0.02081299, 95pct<0.0042, median<0.0016
q4_0::layers.19.attention.wo.weight : mse 0.00000333, maxerr 0.04681396, 95pct<0.0034, median<0.0016
q4_0::layers.19.attention.wq.weight : mse 0.00000460, maxerr 0.04876709, 95pct<0.0042, median<0.0016
q4_0::layers.19.attention.wv.weight : mse 0.00000339, maxerr 0.01137543, 95pct<0.0034, median<0.0016
q4_0::layers.19.feed_forward.w1.weight : mse 0.00000395, maxerr 0.03121948, 95pct<0.0036, median<0.0016
q4_0::layers.19.feed_forward.w2.weight : mse 0.00000361, maxerr 0.04312134, 95pct<0.0034, median<0.0016
q4_0::layers.19.feed_forward.w3.weight : mse 0.00000369, maxerr 0.02177429, 95pct<0.0034, median<0.0016
q4_0::layers.2.attention.wk.weight : mse 0.00001086, maxerr 0.03179932, 95pct<0.0066, median<0.0022
q4_0::layers.2.attention.wo.weight : mse 0.00000149, maxerr 0.04443359, 95pct<0.0022, median<0.0010
q4_0::layers.2.attention.wq.weight : mse 0.00001007, maxerr 0.03594971, 95pct<0.0064, median<0.0022
q4_0::layers.2.attention.wv.weight : mse 0.00000144, maxerr 0.01062012, 95pct<0.0022, median<0.0010
q4_0::layers.2.feed_forward.w1.weight : mse 0.00000381, maxerr 0.04077148, 95pct<0.0036, median<0.0016
q4_0::layers.2.feed_forward.w2.weight : mse 0.00000328, maxerr 0.09649658, 95pct<0.0032, median<0.0016
q4_0::layers.2.feed_forward.w3.weight : mse 0.00000317, maxerr 0.03201294, 95pct<0.0032, median<0.0016
q4_0::layers.20.attention.wk.weight : mse 0.00000486, maxerr 0.02493286, 95pct<0.0042, median<0.0016
q4_0::layers.20.attention.wo.weight : mse 0.00000350, maxerr 0.03179932, 95pct<0.0034, median<0.0016
q4_0::layers.20.attention.wq.weight : mse 0.00000473, maxerr 0.05462646, 95pct<0.0042, median<0.0016
q4_0::layers.20.attention.wv.weight : mse 0.00000360, maxerr 0.01089478, 95pct<0.0034, median<0.0016
q4_0::layers.20.feed_forward.w1.weight : mse 0.00000400, maxerr 0.02357483, 95pct<0.0036, median<0.0016
q4_0::layers.20.feed_forward.w2.weight : mse 0.00000362, maxerr 0.06982422, 95pct<0.0034, median<0.0016
q4_0::layers.20.feed_forward.w3.weight : mse 0.00000369, maxerr 0.01661682, 95pct<0.0034, median<0.0016
q4_0::layers.21.attention.wk.weight : mse 0.00000451, maxerr 0.02557373, 95pct<0.0042, median<0.0016
q4_0::layers.21.attention.wo.weight : mse 0.00000354, maxerr 0.07818604, 95pct<0.0034, median<0.0016
q4_0::layers.21.attention.wq.weight : mse 0.00000443, maxerr 0.05007935, 95pct<0.0040, median<0.0016
q4_0::layers.21.attention.wv.weight : mse 0.00000365, maxerr 0.01040649, 95pct<0.0036, median<0.0016
q4_0::layers.21.feed_forward.w1.weight : mse 0.00000403, maxerr 0.02252197, 95pct<0.0036, median<0.0016
q4_0::layers.21.feed_forward.w2.weight : mse 0.00000362, maxerr 0.03781128, 95pct<0.0034, median<0.0016
q4_0::layers.21.feed_forward.w3.weight : mse 0.00000368, maxerr 0.01501465, 95pct<0.0034, median<0.0016
q4_0::layers.22.attention.wk.weight : mse 0.00000465, maxerr 0.01992798, 95pct<0.0042, median<0.0016
q4_0::layers.22.attention.wo.weight : mse 0.00000357, maxerr 0.09454346, 95pct<0.0034, median<0.0016
q4_0::layers.22.attention.wq.weight : mse 0.00000458, maxerr 0.04550171, 95pct<0.0040, median<0.0016
q4_0::layers.22.attention.wv.weight : mse 0.00000361, maxerr 0.01099396, 95pct<0.0034, median<0.0016
q4_0::layers.22.feed_forward.w1.weight : mse 0.00000405, maxerr 0.02517700, 95pct<0.0036, median<0.0018
q4_0::layers.22.feed_forward.w2.weight : mse 0.00000365, maxerr 0.04281616, 95pct<0.0034, median<0.0016
q4_0::layers.22.feed_forward.w3.weight : mse 0.00000371, maxerr 0.03140259, 95pct<0.0036, median<0.0016
q4_0::layers.23.attention.wk.weight : mse 0.00000433, maxerr 0.02102661, 95pct<0.0040, median<0.0016
q4_0::layers.23.attention.wo.weight : mse 0.00000377, maxerr 0.04870605, 95pct<0.0036, median<0.0016
q4_0::layers.23.attention.wq.weight : mse 0.00000430, maxerr 0.04418945, 95pct<0.0040, median<0.0016
q4_0::layers.23.attention.wv.weight : mse 0.00000390, maxerr 0.01161957, 95pct<0.0036, median<0.0016
q4_0::layers.23.feed_forward.w1.weight : mse 0.00000406, maxerr 0.03436279, 95pct<0.0036, median<0.0018
q4_0::layers.23.feed_forward.w2.weight : mse 0.00000367, maxerr 0.04855347, 95pct<0.0034, median<0.0016
q4_0::layers.23.feed_forward.w3.weight : mse 0.00000373, maxerr 0.02526855, 95pct<0.0036, median<0.0016
q4_0::layers.24.attention.wk.weight : mse 0.00000436, maxerr 0.02143860, 95pct<0.0040, median<0.0016
q4_0::layers.24.attention.wo.weight : mse 0.00000386, maxerr 0.05621338, 95pct<0.0036, median<0.0016
q4_0::layers.24.attention.wq.weight : mse 0.00000431, maxerr 0.04943848, 95pct<0.0040, median<0.0016
q4_0::layers.24.attention.wv.weight : mse 0.00000399, maxerr 0.01126862, 95pct<0.0036, median<0.0016
q4_0::layers.24.feed_forward.w1.weight : mse 0.00000407, maxerr 0.02159119, 95pct<0.0036, median<0.0018
q4_0::layers.24.feed_forward.w2.weight : mse 0.00000371, maxerr 0.06005859, 95pct<0.0034, median<0.0016
q4_0::layers.24.feed_forward.w3.weight : mse 0.00000377, maxerr 0.02104187, 95pct<0.0036, median<0.0016
q4_0::layers.25.attention.wk.weight : mse 0.00000464, maxerr 0.02005005, 95pct<0.0040, median<0.0016
q4_0::layers.25.attention.wo.weight : mse 0.00000393, maxerr 0.04763794, 95pct<0.0036, median<0.0016
q4_0::layers.25.attention.wq.weight : mse 0.00000455, maxerr 0.03808594, 95pct<0.0040, median<0.0016
q4_0::layers.25.attention.wv.weight : mse 0.00000402, maxerr 0.01160431, 95pct<0.0036, median<0.0016
q4_0::layers.25.feed_forward.w1.weight : mse 0.00000409, maxerr 0.02044678, 95pct<0.0036, median<0.0018
q4_0::layers.25.feed_forward.w2.weight : mse 0.00000373, maxerr 0.03298950, 95pct<0.0036, median<0.0016
q4_0::layers.25.feed_forward.w3.weight : mse 0.00000380, maxerr 0.01884460, 95pct<0.0036, median<0.0016
q4_0::layers.26.attention.wk.weight : mse 0.00000449, maxerr 0.02630615, 95pct<0.0040, median<0.0016
q4_0::layers.26.attention.wo.weight : mse 0.00000416, maxerr 0.02603149, 95pct<0.0038, median<0.0018
q4_0::layers.26.attention.wq.weight : mse 0.00000440, maxerr 0.03735352, 95pct<0.0040, median<0.0016
q4_0::layers.26.attention.wv.weight : mse 0.00000426, maxerr 0.01278687, 95pct<0.0038, median<0.0018
q4_0::layers.26.feed_forward.w1.weight : mse 0.00000409, maxerr 0.03500366, 95pct<0.0036, median<0.0018
q4_0::layers.26.feed_forward.w2.weight : mse 0.00000378, maxerr 0.04370117, 95pct<0.0036, median<0.0016
q4_0::layers.26.feed_forward.w3.weight : mse 0.00000386, maxerr 0.03059387, 95pct<0.0036, median<0.0016
q4_0::layers.27.attention.wk.weight : mse 0.00000446, maxerr 0.02406311, 95pct<0.0040, median<0.0016
q4_0::layers.27.attention.wo.weight : mse 0.00000437, maxerr 0.07098389, 95pct<0.0038, median<0.0018
q4_0::layers.27.attention.wq.weight : mse 0.00000442, maxerr 0.04006958, 95pct<0.0040, median<0.0016
q4_0::layers.27.attention.wv.weight : mse 0.00000441, maxerr 0.01357269, 95pct<0.0038, median<0.0018
q4_0::layers.27.feed_forward.w1.weight : mse 0.00000408, maxerr 0.02963257, 95pct<0.0036, median<0.0018
q4_0::layers.27.feed_forward.w2.weight : mse 0.00000383, maxerr 0.04663086, 95pct<0.0036, median<0.0016
q4_0::layers.27.feed_forward.w3.weight : mse 0.00000389, maxerr 0.04153442, 95pct<0.0036, median<0.0016
q4_0::layers.28.attention.wk.weight : mse 0.00000427, maxerr 0.02304077, 95pct<0.0040, median<0.0016
q4_0::layers.28.attention.wo.weight : mse 0.00000446, maxerr 0.05538940, 95pct<0.0038, median<0.0018
q4_0::layers.28.attention.wq.weight : mse 0.00000424, maxerr 0.04208374, 95pct<0.0040, median<0.0016
q4_0::layers.28.attention.wv.weight : mse 0.00000446, maxerr 0.01184082, 95pct<0.0038, median<0.0018
q4_0::layers.28.feed_forward.w1.weight : mse 0.00000405, maxerr 0.03170776, 95pct<0.0036, median<0.0016
q4_0::layers.28.feed_forward.w2.weight : mse 0.00000387, maxerr 0.05294800, 95pct<0.0036, median<0.0016
q4_0::layers.28.feed_forward.w3.weight : mse 0.00000393, maxerr 0.03025818, 95pct<0.0036, median<0.0016
q4_0::layers.29.attention.wk.weight : mse 0.00000421, maxerr 0.01965332, 95pct<0.0040, median<0.0016
q4_0::layers.29.attention.wo.weight : mse 0.00000473, maxerr 0.04461670, 95pct<0.0040, median<0.0018
q4_0::layers.29.attention.wq.weight : mse 0.00000417, maxerr 0.04244995, 95pct<0.0038, median<0.0016
q4_0::layers.29.attention.wv.weight : mse 0.00000473, maxerr 0.01258850, 95pct<0.0040, median<0.0018
q4_0::layers.29.feed_forward.w1.weight : mse 0.00000407, maxerr 0.03314209, 95pct<0.0036, median<0.0016
q4_0::layers.29.feed_forward.w2.weight : mse 0.00000391, maxerr 0.09802246, 95pct<0.0036, median<0.0016
q4_0::layers.29.feed_forward.w3.weight : mse 0.00000397, maxerr 0.02760315, 95pct<0.0036, median<0.0016
q4_0::layers.3.attention.wk.weight : mse 0.00000748, maxerr 0.02275085, 95pct<0.0054, median<0.0020
q4_0::layers.3.attention.wo.weight : mse 0.00000202, maxerr 0.05377197, 95pct<0.0026, median<0.0012
q4_0::layers.3.attention.wq.weight : mse 0.00000683, maxerr 0.04766846, 95pct<0.0050, median<0.0020
q4_0::layers.3.attention.wv.weight : mse 0.00000202, maxerr 0.00859070, 95pct<0.0026, median<0.0012
q4_0::layers.3.feed_forward.w1.weight : mse 0.00000389, maxerr 0.03158569, 95pct<0.0036, median<0.0016
q4_0::layers.3.feed_forward.w2.weight : mse 0.00000331, maxerr 0.05627441, 95pct<0.0034, median<0.0016
q4_0::layers.3.feed_forward.w3.weight : mse 0.00000329, maxerr 0.02278137, 95pct<0.0034, median<0.0016
q4_0::layers.30.attention.wk.weight : mse 0.00000430, maxerr 0.02133179, 95pct<0.0040, median<0.0016
q4_0::layers.30.attention.wo.weight : mse 0.00000472, maxerr 0.06579590, 95pct<0.0040, median<0.0018
q4_0::layers.30.attention.wq.weight : mse 0.00000427, maxerr 0.04168701, 95pct<0.0038, median<0.0016
q4_0::layers.30.attention.wv.weight : mse 0.00000461, maxerr 0.01303864, 95pct<0.0040, median<0.0018
q4_0::layers.30.feed_forward.w1.weight : mse 0.00000412, maxerr 0.02958679, 95pct<0.0038, median<0.0016
q4_0::layers.30.feed_forward.w2.weight : mse 0.00000410, maxerr 0.18200684, 95pct<0.0036, median<0.0016
q4_0::layers.30.feed_forward.w3.weight : mse 0.00000405, maxerr 0.03591919, 95pct<0.0036, median<0.0018
q4_0::layers.31.attention.wk.weight : mse 0.00000459, maxerr 0.02066040, 95pct<0.0040, median<0.0016
q4_0::layers.31.attention.wo.weight : mse 0.00000385, maxerr 0.17346191, 95pct<0.0036, median<0.0016
q4_0::layers.31.attention.wq.weight : mse 0.00000440, maxerr 0.02816772, 95pct<0.0040, median<0.0016
q4_0::layers.31.attention.wv.weight : mse 0.00000375, maxerr 0.01520538, 95pct<0.0036, median<0.0016
q4_0::layers.31.feed_forward.w1.weight : mse 0.00000450, maxerr 0.02647400, 95pct<0.0038, median<0.0018
q4_0::layers.31.feed_forward.w2.weight : mse 0.00000414, maxerr 0.11260986, 95pct<0.0036, median<0.0018
q4_0::layers.31.feed_forward.w3.weight : mse 0.00000440, maxerr 0.04486084, 95pct<0.0038, median<0.0018
q4_0::layers.4.attention.wk.weight : mse 0.00000719, maxerr 0.02165222, 95pct<0.0052, median<0.0020
q4_0::layers.4.attention.wo.weight : mse 0.00000202, maxerr 0.04003906, 95pct<0.0026, median<0.0012
q4_0::layers.4.attention.wq.weight : mse 0.00000707, maxerr 0.04748535, 95pct<0.0052, median<0.0020
q4_0::layers.4.attention.wv.weight : mse 0.00000202, maxerr 0.00906372, 95pct<0.0026, median<0.0012
q4_0::layers.4.feed_forward.w1.weight : mse 0.00000399, maxerr 0.03872681, 95pct<0.0036, median<0.0016
q4_0::layers.4.feed_forward.w2.weight : mse 0.00000328, maxerr 0.05072021, 95pct<0.0032, median<0.0016
q4_0::layers.4.feed_forward.w3.weight : mse 0.00000331, maxerr 0.03533936, 95pct<0.0034, median<0.0016
q4_0::layers.5.attention.wk.weight : mse 0.00000640, maxerr 0.03253174, 95pct<0.0050, median<0.0018
q4_0::layers.5.attention.wo.weight : mse 0.00000207, maxerr 0.04260254, 95pct<0.0026, median<0.0012
q4_0::layers.5.attention.wq.weight : mse 0.00000631, maxerr 0.04281616, 95pct<0.0048, median<0.0018
q4_0::layers.5.attention.wv.weight : mse 0.00000209, maxerr 0.01441193, 95pct<0.0026, median<0.0012
q4_0::layers.5.feed_forward.w1.weight : mse 0.00000416, maxerr 0.03350830, 95pct<0.0038, median<0.0018
q4_0::layers.5.feed_forward.w2.weight : mse 0.00000322, maxerr 0.04428101, 95pct<0.0032, median<0.0016
q4_0::layers.5.feed_forward.w3.weight : mse 0.00000329, maxerr 0.02728271, 95pct<0.0034, median<0.0016
q4_0::layers.6.attention.wk.weight : mse 0.00000670, maxerr 0.02120972, 95pct<0.0050, median<0.0020
q4_0::layers.6.attention.wo.weight : mse 0.00000211, maxerr 0.05706787, 95pct<0.0026, median<0.0012
q4_0::layers.6.attention.wq.weight : mse 0.00000641, maxerr 0.04986572, 95pct<0.0048, median<0.0020
q4_0::layers.6.attention.wv.weight : mse 0.00000212, maxerr 0.00904083, 95pct<0.0028, median<0.0012
q4_0::layers.6.feed_forward.w1.weight : mse 0.00000404, maxerr 0.04025269, 95pct<0.0036, median<0.0016
q4_0::layers.6.feed_forward.w2.weight : mse 0.00000329, maxerr 0.04974365, 95pct<0.0032, median<0.0016
q4_0::layers.6.feed_forward.w3.weight : mse 0.00000337, maxerr 0.02461243, 95pct<0.0034, median<0.0016
q4_0::layers.7.attention.wk.weight : mse 0.00000629, maxerr 0.02215576, 95pct<0.0050, median<0.0018
q4_0::layers.7.attention.wo.weight : mse 0.00000220, maxerr 0.03393555, 95pct<0.0028, median<0.0012
q4_0::layers.7.attention.wq.weight : mse 0.00000619, maxerr 0.04800415, 95pct<0.0048, median<0.0018
q4_0::layers.7.attention.wv.weight : mse 0.00000226, maxerr 0.00874329, 95pct<0.0028, median<0.0012
q4_0::layers.7.feed_forward.w1.weight : mse 0.00000397, maxerr 0.02743530, 95pct<0.0036, median<0.0016
q4_0::layers.7.feed_forward.w2.weight : mse 0.00000331, maxerr 0.04418945, 95pct<0.0034, median<0.0016
q4_0::layers.7.feed_forward.w3.weight : mse 0.00000340, maxerr 0.02548218, 95pct<0.0034, median<0.0016
q4_0::layers.8.attention.wk.weight : mse 0.00000599, maxerr 0.02326965, 95pct<0.0048, median<0.0018
q4_0::layers.8.attention.wo.weight : mse 0.00000218, maxerr 0.03485107, 95pct<0.0028, median<0.0012
q4_0::layers.8.attention.wq.weight : mse 0.00000598, maxerr 0.04147339, 95pct<0.0048, median<0.0018
q4_0::layers.8.attention.wv.weight : mse 0.00000221, maxerr 0.01078796, 95pct<0.0028, median<0.0012
q4_0::layers.8.feed_forward.w1.weight : mse 0.00000397, maxerr 0.02899170, 95pct<0.0036, median<0.0016
q4_0::layers.8.feed_forward.w2.weight : mse 0.00000332, maxerr 0.03820801, 95pct<0.0034, median<0.0016
q4_0::layers.8.feed_forward.w3.weight : mse 0.00000342, maxerr 0.02285767, 95pct<0.0034, median<0.0016
q4_0::layers.9.attention.wk.weight : mse 0.00000567, maxerr 0.02212524, 95pct<0.0048, median<0.0018
q4_0::layers.9.attention.wo.weight : mse 0.00000215, maxerr 0.03619385, 95pct<0.0028, median<0.0012
q4_0::layers.9.attention.wq.weight : mse 0.00000560, maxerr 0.04025269, 95pct<0.0046, median<0.0018
q4_0::layers.9.attention.wv.weight : mse 0.00000218, maxerr 0.00840759, 95pct<0.0028, median<0.0012
q4_0::layers.9.feed_forward.w1.weight : mse 0.00000386, maxerr 0.04083252, 95pct<0.0036, median<0.0016
q4_0::layers.9.feed_forward.w2.weight : mse 0.00000337, maxerr 0.04580688, 95pct<0.0034, median<0.0016
q4_0::layers.9.feed_forward.w3.weight : mse 0.00000346, maxerr 0.04849243, 95pct<0.0034, median<0.0016
q4_0::output.weight : mse 0.00000308, maxerr 0.02610779, 95pct<0.0032, median<0.0014
q4_0::tok_embeddings.weight : mse 0.00000302, maxerr 0.01635742, 95pct<0.0032, median<0.0014
q4_0 : mse 0.00000386, maxerr 0.18200684, 95pct<0.0036, median<0.0016
q4_1::layers.0.attention.wk.weight : mse 0.00000684, maxerr 0.04107666, 95pct<0.0054, median<0.0016
q4_1::layers.0.attention.wo.weight : mse 0.00000092, maxerr 0.02982175, 95pct<0.0020, median<0.0008
q4_1::layers.0.attention.wq.weight : mse 0.00000702, maxerr 0.02834333, 95pct<0.0056, median<0.0016
q4_1::layers.0.attention.wv.weight : mse 0.00000114, maxerr 0.00560303, 95pct<0.0022, median<0.0008
q4_1::layers.0.feed_forward.w1.weight : mse 0.00000175, maxerr 0.03655243, 95pct<0.0024, median<0.0012
q4_1::layers.0.feed_forward.w2.weight : mse 0.00000259, maxerr 0.04300943, 95pct<0.0030, median<0.0014
q4_1::layers.0.feed_forward.w3.weight : mse 0.00000165, maxerr 0.01006266, 95pct<0.0024, median<0.0012
q4_1::layers.1.attention.wk.weight : mse 0.00000728, maxerr 0.02334900, 95pct<0.0058, median<0.0016
q4_1::layers.1.attention.wo.weight : mse 0.00000086, maxerr 0.03453889, 95pct<0.0018, median<0.0008
q4_1::layers.1.attention.wq.weight : mse 0.00000700, maxerr 0.01987410, 95pct<0.0056, median<0.0016
q4_1::layers.1.attention.wv.weight : mse 0.00000083, maxerr 0.00478211, 95pct<0.0018, median<0.0008
q4_1::layers.1.feed_forward.w1.weight : mse 0.00000283, maxerr 0.02051294, 95pct<0.0030, median<0.0014
q4_1::layers.1.feed_forward.w2.weight : mse 0.00000272, maxerr 0.03843182, 95pct<0.0030, median<0.0014
q4_1::layers.1.feed_forward.w3.weight : mse 0.00000255, maxerr 0.01320738, 95pct<0.0030, median<0.0014
q4_1::layers.10.attention.wk.weight : mse 0.00000472, maxerr 0.01563987, 95pct<0.0044, median<0.0016
q4_1::layers.10.attention.wo.weight : mse 0.00000193, maxerr 0.02667642, 95pct<0.0026, median<0.0012
q4_1::layers.10.attention.wq.weight : mse 0.00000462, maxerr 0.02052003, 95pct<0.0042, median<0.0016
q4_1::layers.10.attention.wv.weight : mse 0.00000194, maxerr 0.00943857, 95pct<0.0026, median<0.0012
q4_1::layers.10.feed_forward.w1.weight : mse 0.00000314, maxerr 0.01556396, 95pct<0.0032, median<0.0014
q4_1::layers.10.feed_forward.w2.weight : mse 0.00000283, maxerr 0.02537537, 95pct<0.0030, median<0.0014
q4_1::layers.10.feed_forward.w3.weight : mse 0.00000294, maxerr 0.01292909, 95pct<0.0032, median<0.0014
q4_1::layers.11.attention.wk.weight : mse 0.00000505, maxerr 0.01603444, 95pct<0.0044, median<0.0016
q4_1::layers.11.attention.wo.weight : mse 0.00000212, maxerr 0.02708334, 95pct<0.0028, median<0.0012
q4_1::layers.11.attention.wq.weight : mse 0.00000490, maxerr 0.02761781, 95pct<0.0042, median<0.0016
q4_1::layers.11.attention.wv.weight : mse 0.00000215, maxerr 0.00829771, 95pct<0.0028, median<0.0012
q4_1::layers.11.feed_forward.w1.weight : mse 0.00000313, maxerr 0.01594034, 95pct<0.0032, median<0.0014
q4_1::layers.11.feed_forward.w2.weight : mse 0.00000288, maxerr 0.03227139, 95pct<0.0032, median<0.0014
q4_1::layers.11.feed_forward.w3.weight : mse 0.00000297, maxerr 0.01712444, 95pct<0.0032, median<0.0014
q4_1::layers.12.attention.wk.weight : mse 0.00000460, maxerr 0.01596579, 95pct<0.0042, median<0.0016
q4_1::layers.12.attention.wo.weight : mse 0.00000205, maxerr 0.02017212, 95pct<0.0026, median<0.0012
q4_1::layers.12.attention.wq.weight : mse 0.00000446, maxerr 0.02285360, 95pct<0.0042, median<0.0016
q4_1::layers.12.attention.wv.weight : mse 0.00000200, maxerr 0.00573961, 95pct<0.0026, median<0.0012
q4_1::layers.12.feed_forward.w1.weight : mse 0.00000316, maxerr 0.02284165, 95pct<0.0032, median<0.0014
q4_1::layers.12.feed_forward.w2.weight : mse 0.00000289, maxerr 0.03540853, 95pct<0.0032, median<0.0014
q4_1::layers.12.feed_forward.w3.weight : mse 0.00000301, maxerr 0.01079203, 95pct<0.0032, median<0.0014
q4_1::layers.13.attention.wk.weight : mse 0.00000439, maxerr 0.01437837, 95pct<0.0042, median<0.0016
q4_1::layers.13.attention.wo.weight : mse 0.00000220, maxerr 0.03025717, 95pct<0.0028, median<0.0012
q4_1::layers.13.attention.wq.weight : mse 0.00000425, maxerr 0.02128702, 95pct<0.0040, median<0.0016
q4_1::layers.13.attention.wv.weight : mse 0.00000219, maxerr 0.00622152, 95pct<0.0028, median<0.0012
q4_1::layers.13.feed_forward.w1.weight : mse 0.00000313, maxerr 0.01500246, 95pct<0.0032, median<0.0014
q4_1::layers.13.feed_forward.w2.weight : mse 0.00000294, maxerr 0.02633134, 95pct<0.0032, median<0.0014
q4_1::layers.13.feed_forward.w3.weight : mse 0.00000307, maxerr 0.01099548, 95pct<0.0032, median<0.0014
q4_1::layers.14.attention.wk.weight : mse 0.00000438, maxerr 0.01511636, 95pct<0.0042, median<0.0016
q4_1::layers.14.attention.wo.weight : mse 0.00000221, maxerr 0.02438152, 95pct<0.0028, median<0.0012
q4_1::layers.14.attention.wq.weight : mse 0.00000431, maxerr 0.02371013, 95pct<0.0040, median<0.0016
q4_1::layers.14.attention.wv.weight : mse 0.00000222, maxerr 0.00633850, 95pct<0.0028, median<0.0012
q4_1::layers.14.feed_forward.w1.weight : mse 0.00000313, maxerr 0.01666871, 95pct<0.0032, median<0.0014
q4_1::layers.14.feed_forward.w2.weight : mse 0.00000297, maxerr 0.03455403, 95pct<0.0032, median<0.0014
q4_1::layers.14.feed_forward.w3.weight : mse 0.00000309, maxerr 0.01462197, 95pct<0.0032, median<0.0016
q4_1::layers.15.attention.wk.weight : mse 0.00000446, maxerr 0.01495159, 95pct<0.0042, median<0.0016
q4_1::layers.15.attention.wo.weight : mse 0.00000222, maxerr 0.02541506, 95pct<0.0028, median<0.0012
q4_1::layers.15.attention.wq.weight : mse 0.00000431, maxerr 0.02229919, 95pct<0.0040, median<0.0016
q4_1::layers.15.attention.wv.weight : mse 0.00000223, maxerr 0.00649338, 95pct<0.0028, median<0.0012
q4_1::layers.15.feed_forward.w1.weight : mse 0.00000313, maxerr 0.01446533, 95pct<0.0032, median<0.0014
q4_1::layers.15.feed_forward.w2.weight : mse 0.00000297, maxerr 0.04414570, 95pct<0.0032, median<0.0014
q4_1::layers.15.feed_forward.w3.weight : mse 0.00000309, maxerr 0.01306508, 95pct<0.0032, median<0.0016
q4_1::layers.16.attention.wk.weight : mse 0.00000441, maxerr 0.01464437, 95pct<0.0040, median<0.0016
q4_1::layers.16.attention.wo.weight : mse 0.00000250, maxerr 0.04169917, 95pct<0.0030, median<0.0014
q4_1::layers.16.attention.wq.weight : mse 0.00000419, maxerr 0.02543133, 95pct<0.0040, median<0.0016
q4_1::layers.16.attention.wv.weight : mse 0.00000253, maxerr 0.00660706, 95pct<0.0030, median<0.0014
q4_1::layers.16.feed_forward.w1.weight : mse 0.00000317, maxerr 0.01479188, 95pct<0.0032, median<0.0014
q4_1::layers.16.feed_forward.w2.weight : mse 0.00000297, maxerr 0.03672282, 95pct<0.0032, median<0.0014
q4_1::layers.16.feed_forward.w3.weight : mse 0.00000307, maxerr 0.01314189, 95pct<0.0032, median<0.0014
q4_1::layers.17.attention.wk.weight : mse 0.00000418, maxerr 0.01311338, 95pct<0.0040, median<0.0016
q4_1::layers.17.attention.wo.weight : mse 0.00000256, maxerr 0.02875367, 95pct<0.0030, median<0.0014
q4_1::layers.17.attention.wq.weight : mse 0.00000401, maxerr 0.03082275, 95pct<0.0038, median<0.0016
q4_1::layers.17.attention.wv.weight : mse 0.00000256, maxerr 0.00813599, 95pct<0.0030, median<0.0014
q4_1::layers.17.feed_forward.w1.weight : mse 0.00000319, maxerr 0.01236165, 95pct<0.0032, median<0.0016
q4_1::layers.17.feed_forward.w2.weight : mse 0.00000299, maxerr 0.02805888, 95pct<0.0032, median<0.0014
q4_1::layers.17.feed_forward.w3.weight : mse 0.00000309, maxerr 0.01775716, 95pct<0.0032, median<0.0016
q4_1::layers.18.attention.wk.weight : mse 0.00000403, maxerr 0.01377869, 95pct<0.0040, median<0.0014
q4_1::layers.18.attention.wo.weight : mse 0.00000254, maxerr 0.03247070, 95pct<0.0030, median<0.0014
q4_1::layers.18.attention.wq.weight : mse 0.00000392, maxerr 0.02439576, 95pct<0.0038, median<0.0014
q4_1::layers.18.attention.wv.weight : mse 0.00000255, maxerr 0.00598729, 95pct<0.0030, median<0.0014
q4_1::layers.18.feed_forward.w1.weight : mse 0.00000324, maxerr 0.01477051, 95pct<0.0034, median<0.0016
q4_1::layers.18.feed_forward.w2.weight : mse 0.00000298, maxerr 0.03860271, 95pct<0.0032, median<0.0014
q4_1::layers.18.feed_forward.w3.weight : mse 0.00000307, maxerr 0.01042479, 95pct<0.0032, median<0.0014
q4_1::layers.19.attention.wk.weight : mse 0.00000388, maxerr 0.01365611, 95pct<0.0038, median<0.0014
q4_1::layers.19.attention.wo.weight : mse 0.00000276, maxerr 0.03216144, 95pct<0.0030, median<0.0014
q4_1::layers.19.attention.wq.weight : mse 0.00000378, maxerr 0.02803510, 95pct<0.0038, median<0.0014
q4_1::layers.19.attention.wv.weight : mse 0.00000280, maxerr 0.00700684, 95pct<0.0030, median<0.0014
q4_1::layers.19.feed_forward.w1.weight : mse 0.00000328, maxerr 0.01841432, 95pct<0.0034, median<0.0016
q4_1::layers.19.feed_forward.w2.weight : mse 0.00000299, maxerr 0.02656788, 95pct<0.0032, median<0.0014
q4_1::layers.19.feed_forward.w3.weight : mse 0.00000306, maxerr 0.01330259, 95pct<0.0032, median<0.0014
q4_1::layers.2.attention.wk.weight : mse 0.00000883, maxerr 0.01976573, 95pct<0.0062, median<0.0020
q4_1::layers.2.attention.wo.weight : mse 0.00000123, maxerr 0.03828126, 95pct<0.0020, median<0.0010
q4_1::layers.2.attention.wq.weight : mse 0.00000823, maxerr 0.02216390, 95pct<0.0058, median<0.0020
q4_1::layers.2.attention.wv.weight : mse 0.00000119, maxerr 0.00736135, 95pct<0.0020, median<0.0010
q4_1::layers.2.feed_forward.w1.weight : mse 0.00000315, maxerr 0.03544718, 95pct<0.0032, median<0.0016
q4_1::layers.2.feed_forward.w2.weight : mse 0.00000271, maxerr 0.05198061, 95pct<0.0030, median<0.0014
q4_1::layers.2.feed_forward.w3.weight : mse 0.00000262, maxerr 0.01909560, 95pct<0.0030, median<0.0014
q4_1::layers.20.attention.wk.weight : mse 0.00000400, maxerr 0.01639201, 95pct<0.0038, median<0.0016
q4_1::layers.20.attention.wo.weight : mse 0.00000290, maxerr 0.02312827, 95pct<0.0032, median<0.0014
q4_1::layers.20.attention.wq.weight : mse 0.00000388, maxerr 0.03564453, 95pct<0.0038, median<0.0016
q4_1::layers.20.attention.wv.weight : mse 0.00000298, maxerr 0.00713094, 95pct<0.0032, median<0.0014
q4_1::layers.20.feed_forward.w1.weight : mse 0.00000331, maxerr 0.01476848, 95pct<0.0034, median<0.0016
q4_1::layers.20.feed_forward.w2.weight : mse 0.00000300, maxerr 0.04094645, 95pct<0.0032, median<0.0014
q4_1::layers.20.feed_forward.w3.weight : mse 0.00000306, maxerr 0.01144791, 95pct<0.0032, median<0.0014
q4_1::layers.21.attention.wk.weight : mse 0.00000371, maxerr 0.01407850, 95pct<0.0038, median<0.0014
q4_1::layers.21.attention.wo.weight : mse 0.00000294, maxerr 0.04772949, 95pct<0.0032, median<0.0014
q4_1::layers.21.attention.wq.weight : mse 0.00000363, maxerr 0.02847900, 95pct<0.0038, median<0.0014
q4_1::layers.21.attention.wv.weight : mse 0.00000302, maxerr 0.00687256, 95pct<0.0032, median<0.0014
q4_1::layers.21.feed_forward.w1.weight : mse 0.00000334, maxerr 0.01481831, 95pct<0.0034, median<0.0016
q4_1::layers.21.feed_forward.w2.weight : mse 0.00000299, maxerr 0.02454491, 95pct<0.0032, median<0.0014
q4_1::layers.21.feed_forward.w3.weight : mse 0.00000305, maxerr 0.00941722, 95pct<0.0032, median<0.0014
q4_1::layers.22.attention.wk.weight : mse 0.00000382, maxerr 0.01457518, 95pct<0.0038, median<0.0016
q4_1::layers.22.attention.wo.weight : mse 0.00000296, maxerr 0.05219725, 95pct<0.0032, median<0.0014
q4_1::layers.22.attention.wq.weight : mse 0.00000375, maxerr 0.02614343, 95pct<0.0038, median<0.0016
q4_1::layers.22.attention.wv.weight : mse 0.00000298, maxerr 0.00633164, 95pct<0.0032, median<0.0014
q4_1::layers.22.feed_forward.w1.weight : mse 0.00000335, maxerr 0.01621208, 95pct<0.0034, median<0.0016
q4_1::layers.22.feed_forward.w2.weight : mse 0.00000302, maxerr 0.02524516, 95pct<0.0032, median<0.0014
q4_1::layers.22.feed_forward.w3.weight : mse 0.00000308, maxerr 0.01791126, 95pct<0.0032, median<0.0016
q4_1::layers.23.attention.wk.weight : mse 0.00000355, maxerr 0.01381835, 95pct<0.0038, median<0.0014
q4_1::layers.23.attention.wo.weight : mse 0.00000312, maxerr 0.05039060, 95pct<0.0032, median<0.0014
q4_1::layers.23.attention.wq.weight : mse 0.00000352, maxerr 0.02543131, 95pct<0.0036, median<0.0014
q4_1::layers.23.attention.wv.weight : mse 0.00000323, maxerr 0.00705466, 95pct<0.0034, median<0.0016
q4_1::layers.23.feed_forward.w1.weight : mse 0.00000336, maxerr 0.02019602, 95pct<0.0034, median<0.0016
q4_1::layers.23.feed_forward.w2.weight : mse 0.00000304, maxerr 0.02755737, 95pct<0.0032, median<0.0014
q4_1::layers.23.feed_forward.w3.weight : mse 0.00000309, maxerr 0.01500538, 95pct<0.0032, median<0.0016
q4_1::layers.24.attention.wk.weight : mse 0.00000358, maxerr 0.01357117, 95pct<0.0038, median<0.0014
q4_1::layers.24.attention.wo.weight : mse 0.00000320, maxerr 0.03517246, 95pct<0.0032, median<0.0016
q4_1::layers.24.attention.wq.weight : mse 0.00000353, maxerr 0.02697754, 95pct<0.0036, median<0.0014
q4_1::layers.24.attention.wv.weight : mse 0.00000330, maxerr 0.00695597, 95pct<0.0034, median<0.0016
q4_1::layers.24.feed_forward.w1.weight : mse 0.00000337, maxerr 0.01255596, 95pct<0.0034, median<0.0016
q4_1::layers.24.feed_forward.w2.weight : mse 0.00000307, maxerr 0.03697109, 95pct<0.0032, median<0.0016
q4_1::layers.24.feed_forward.w3.weight : mse 0.00000312, maxerr 0.01249239, 95pct<0.0032, median<0.0016
q4_1::layers.25.attention.wk.weight : mse 0.00000382, maxerr 0.01319379, 95pct<0.0038, median<0.0016
q4_1::layers.25.attention.wo.weight : mse 0.00000326, maxerr 0.03653157, 95pct<0.0034, median<0.0016
q4_1::layers.25.attention.wq.weight : mse 0.00000373, maxerr 0.02534175, 95pct<0.0036, median<0.0016
q4_1::layers.25.attention.wv.weight : mse 0.00000333, maxerr 0.00774231, 95pct<0.0034, median<0.0016
q4_1::layers.25.feed_forward.w1.weight : mse 0.00000339, maxerr 0.01365763, 95pct<0.0034, median<0.0016
q4_1::layers.25.feed_forward.w2.weight : mse 0.00000309, maxerr 0.02395630, 95pct<0.0032, median<0.0016
q4_1::layers.25.feed_forward.w3.weight : mse 0.00000315, maxerr 0.01177013, 95pct<0.0032, median<0.0016
q4_1::layers.26.attention.wk.weight : mse 0.00000370, maxerr 0.01424815, 95pct<0.0036, median<0.0016
q4_1::layers.26.attention.wo.weight : mse 0.00000345, maxerr 0.02384442, 95pct<0.0034, median<0.0016
q4_1::layers.26.attention.wq.weight : mse 0.00000361, maxerr 0.02352905, 95pct<0.0036, median<0.0016
q4_1::layers.26.attention.wv.weight : mse 0.00000353, maxerr 0.00762227, 95pct<0.0034, median<0.0016
q4_1::layers.26.feed_forward.w1.weight : mse 0.00000338, maxerr 0.02146912, 95pct<0.0034, median<0.0016
q4_1::layers.26.feed_forward.w2.weight : mse 0.00000313, maxerr 0.02818197, 95pct<0.0032, median<0.0016
q4_1::layers.26.feed_forward.w3.weight : mse 0.00000319, maxerr 0.02482224, 95pct<0.0032, median<0.0016
q4_1::layers.27.attention.wk.weight : mse 0.00000367, maxerr 0.01493329, 95pct<0.0036, median<0.0016
q4_1::layers.27.attention.wo.weight : mse 0.00000362, maxerr 0.05037433, 95pct<0.0034, median<0.0016
q4_1::layers.27.attention.wq.weight : mse 0.00000361, maxerr 0.02156782, 95pct<0.0036, median<0.0016
q4_1::layers.27.attention.wv.weight : mse 0.00000365, maxerr 0.00810165, 95pct<0.0036, median<0.0016
q4_1::layers.27.feed_forward.w1.weight : mse 0.00000338, maxerr 0.02540493, 95pct<0.0034, median<0.0016
q4_1::layers.27.feed_forward.w2.weight : mse 0.00000316, maxerr 0.02953517, 95pct<0.0032, median<0.0016
q4_1::layers.27.feed_forward.w3.weight : mse 0.00000322, maxerr 0.02640279, 95pct<0.0032, median<0.0016
q4_1::layers.28.attention.wk.weight : mse 0.00000351, maxerr 0.01595867, 95pct<0.0036, median<0.0014
q4_1::layers.28.attention.wo.weight : mse 0.00000370, maxerr 0.02981770, 95pct<0.0036, median<0.0016
q4_1::layers.28.attention.wq.weight : mse 0.00000347, maxerr 0.02494049, 95pct<0.0036, median<0.0014
q4_1::layers.28.attention.wv.weight : mse 0.00000369, maxerr 0.00791423, 95pct<0.0036, median<0.0016
q4_1::layers.28.feed_forward.w1.weight : mse 0.00000335, maxerr 0.02810669, 95pct<0.0034, median<0.0016
q4_1::layers.28.feed_forward.w2.weight : mse 0.00000319, maxerr 0.03309225, 95pct<0.0032, median<0.0016
q4_1::layers.28.feed_forward.w3.weight : mse 0.00000325, maxerr 0.02055053, 95pct<0.0032, median<0.0016
q4_1::layers.29.attention.wk.weight : mse 0.00000346, maxerr 0.01428223, 95pct<0.0036, median<0.0014
q4_1::layers.29.attention.wo.weight : mse 0.00000392, maxerr 0.03439641, 95pct<0.0036, median<0.0016
q4_1::layers.29.attention.wq.weight : mse 0.00000340, maxerr 0.02388712, 95pct<0.0036, median<0.0014
q4_1::layers.29.attention.wv.weight : mse 0.00000391, maxerr 0.00761922, 95pct<0.0036, median<0.0016
q4_1::layers.29.feed_forward.w1.weight : mse 0.00000337, maxerr 0.02038574, 95pct<0.0034, median<0.0016
q4_1::layers.29.feed_forward.w2.weight : mse 0.00000321, maxerr 0.05755107, 95pct<0.0032, median<0.0016
q4_1::layers.29.feed_forward.w3.weight : mse 0.00000329, maxerr 0.01542050, 95pct<0.0034, median<0.0016
q4_1::layers.3.attention.wk.weight : mse 0.00000611, maxerr 0.01627603, 95pct<0.0050, median<0.0018
q4_1::layers.3.attention.wo.weight : mse 0.00000167, maxerr 0.03495282, 95pct<0.0024, median<0.0012
q4_1::layers.3.attention.wq.weight : mse 0.00000552, maxerr 0.02804718, 95pct<0.0046, median<0.0018
q4_1::layers.3.attention.wv.weight : mse 0.00000167, maxerr 0.00555267, 95pct<0.0024, median<0.0012
q4_1::layers.3.feed_forward.w1.weight : mse 0.00000322, maxerr 0.02117920, 95pct<0.0032, median<0.0016
q4_1::layers.3.feed_forward.w2.weight : mse 0.00000273, maxerr 0.03491618, 95pct<0.0030, median<0.0014
q4_1::layers.3.feed_forward.w3.weight : mse 0.00000272, maxerr 0.01362103, 95pct<0.0030, median<0.0014
q4_1::layers.30.attention.wk.weight : mse 0.00000353, maxerr 0.01795453, 95pct<0.0036, median<0.0014
q4_1::layers.30.attention.wo.weight : mse 0.00000391, maxerr 0.04497075, 95pct<0.0036, median<0.0016
q4_1::layers.30.attention.wq.weight : mse 0.00000347, maxerr 0.02401968, 95pct<0.0036, median<0.0014
q4_1::layers.30.attention.wv.weight : mse 0.00000381, maxerr 0.00759277, 95pct<0.0036, median<0.0016
q4_1::layers.30.feed_forward.w1.weight : mse 0.00000341, maxerr 0.01782125, 95pct<0.0034, median<0.0016
q4_1::layers.30.feed_forward.w2.weight : mse 0.00000342, maxerr 0.12756348, 95pct<0.0032, median<0.0016
q4_1::layers.30.feed_forward.w3.weight : mse 0.00000335, maxerr 0.02418011, 95pct<0.0034, median<0.0016
q4_1::layers.31.attention.wk.weight : mse 0.00000375, maxerr 0.01362303, 95pct<0.0038, median<0.0016
q4_1::layers.31.attention.wo.weight : mse 0.00000319, maxerr 0.10161138, 95pct<0.0032, median<0.0014
q4_1::layers.31.attention.wq.weight : mse 0.00000357, maxerr 0.01820374, 95pct<0.0036, median<0.0016
q4_1::layers.31.attention.wv.weight : mse 0.00000310, maxerr 0.00997696, 95pct<0.0032, median<0.0014
q4_1::layers.31.feed_forward.w1.weight : mse 0.00000371, maxerr 0.02717184, 95pct<0.0036, median<0.0016
q4_1::layers.31.feed_forward.w2.weight : mse 0.00000336, maxerr 0.09575176, 95pct<0.0034, median<0.0016
q4_1::layers.31.feed_forward.w3.weight : mse 0.00000363, maxerr 0.03244019, 95pct<0.0034, median<0.0016
q4_1::layers.4.attention.wk.weight : mse 0.00000589, maxerr 0.01599121, 95pct<0.0048, median<0.0018
q4_1::layers.4.attention.wo.weight : mse 0.00000167, maxerr 0.02775061, 95pct<0.0024, median<0.0012
q4_1::layers.4.attention.wq.weight : mse 0.00000573, maxerr 0.02762246, 95pct<0.0046, median<0.0018
q4_1::layers.4.attention.wv.weight : mse 0.00000167, maxerr 0.00593816, 95pct<0.0024, median<0.0010
q4_1::layers.4.feed_forward.w1.weight : mse 0.00000330, maxerr 0.02607116, 95pct<0.0034, median<0.0016
q4_1::layers.4.feed_forward.w2.weight : mse 0.00000271, maxerr 0.04353638, 95pct<0.0030, median<0.0014
q4_1::layers.4.feed_forward.w3.weight : mse 0.00000274, maxerr 0.02257079, 95pct<0.0030, median<0.0014
q4_1::layers.5.attention.wk.weight : mse 0.00000527, maxerr 0.02016246, 95pct<0.0046, median<0.0016
q4_1::layers.5.attention.wo.weight : mse 0.00000171, maxerr 0.04142249, 95pct<0.0024, median<0.0012
q4_1::layers.5.attention.wq.weight : mse 0.00000516, maxerr 0.02691448, 95pct<0.0044, median<0.0016
q4_1::layers.5.attention.wv.weight : mse 0.00000173, maxerr 0.00809692, 95pct<0.0024, median<0.0012
q4_1::layers.5.feed_forward.w1.weight : mse 0.00000344, maxerr 0.02066040, 95pct<0.0034, median<0.0016
q4_1::layers.5.feed_forward.w2.weight : mse 0.00000266, maxerr 0.02678931, 95pct<0.0030, median<0.0014
q4_1::layers.5.feed_forward.w3.weight : mse 0.00000272, maxerr 0.01605021, 95pct<0.0030, median<0.0014
q4_1::layers.6.attention.wk.weight : mse 0.00000552, maxerr 0.01503804, 95pct<0.0046, median<0.0018
q4_1::layers.6.attention.wo.weight : mse 0.00000175, maxerr 0.03727213, 95pct<0.0024, median<0.0012
q4_1::layers.6.attention.wq.weight : mse 0.00000526, maxerr 0.03130698, 95pct<0.0044, median<0.0018
q4_1::layers.6.attention.wv.weight : mse 0.00000176, maxerr 0.00586955, 95pct<0.0024, median<0.0012
q4_1::layers.6.feed_forward.w1.weight : mse 0.00000334, maxerr 0.02278137, 95pct<0.0034, median<0.0016
q4_1::layers.6.feed_forward.w2.weight : mse 0.00000272, maxerr 0.03055978, 95pct<0.0030, median<0.0014
q4_1::layers.6.feed_forward.w3.weight : mse 0.00000279, maxerr 0.01386768, 95pct<0.0030, median<0.0014
q4_1::layers.7.attention.wk.weight : mse 0.00000518, maxerr 0.01637778, 95pct<0.0044, median<0.0016
q4_1::layers.7.attention.wo.weight : mse 0.00000182, maxerr 0.02817380, 95pct<0.0026, median<0.0012
q4_1::layers.7.attention.wq.weight : mse 0.00000509, maxerr 0.02885771, 95pct<0.0044, median<0.0016
q4_1::layers.7.attention.wv.weight : mse 0.00000187, maxerr 0.00640869, 95pct<0.0026, median<0.0012
q4_1::layers.7.feed_forward.w1.weight : mse 0.00000328, maxerr 0.01696777, 95pct<0.0034, median<0.0016
q4_1::layers.7.feed_forward.w2.weight : mse 0.00000274, maxerr 0.02849120, 95pct<0.0030, median<0.0014
q4_1::layers.7.feed_forward.w3.weight : mse 0.00000281, maxerr 0.01903725, 95pct<0.0030, median<0.0014
q4_1::layers.8.attention.wk.weight : mse 0.00000493, maxerr 0.01597899, 95pct<0.0044, median<0.0016
q4_1::layers.8.attention.wo.weight : mse 0.00000181, maxerr 0.02582398, 95pct<0.0026, median<0.0012
q4_1::layers.8.attention.wq.weight : mse 0.00000492, maxerr 0.02330780, 95pct<0.0044, median<0.0016
q4_1::layers.8.attention.wv.weight : mse 0.00000183, maxerr 0.00699462, 95pct<0.0026, median<0.0012
q4_1::layers.8.feed_forward.w1.weight : mse 0.00000328, maxerr 0.01851404, 95pct<0.0034, median<0.0016
q4_1::layers.8.feed_forward.w2.weight : mse 0.00000274, maxerr 0.02776897, 95pct<0.0030, median<0.0014
q4_1::layers.8.feed_forward.w3.weight : mse 0.00000283, maxerr 0.01309204, 95pct<0.0030, median<0.0014
q4_1::layers.9.attention.wk.weight : mse 0.00000468, maxerr 0.01326293, 95pct<0.0044, median<0.0016
q4_1::layers.9.attention.wo.weight : mse 0.00000178, maxerr 0.03066409, 95pct<0.0024, median<0.0012
q4_1::layers.9.attention.wq.weight : mse 0.00000461, maxerr 0.02470907, 95pct<0.0042, median<0.0016
q4_1::layers.9.attention.wv.weight : mse 0.00000180, maxerr 0.00619888, 95pct<0.0026, median<0.0012
q4_1::layers.9.feed_forward.w1.weight : mse 0.00000319, maxerr 0.02470452, 95pct<0.0034, median<0.0014
q4_1::layers.9.feed_forward.w2.weight : mse 0.00000278, maxerr 0.02815247, 95pct<0.0030, median<0.0014
q4_1::layers.9.feed_forward.w3.weight : mse 0.00000286, maxerr 0.02717841, 95pct<0.0032, median<0.0014
q4_1::output.weight : mse 0.00000251, maxerr 0.01462148, 95pct<0.0030, median<0.0014
q4_1::tok_embeddings.weight : mse 0.00000250, maxerr 0.01170197, 95pct<0.0030, median<0.0014
q4_1 : mse 0.00000318, maxerr 0.12756348, 95pct<0.0034, median<0.0014
main: total time = 240118.23 ms
I used this for 2-bit quantization, where it did make a big difference (after all, it lets you use 4 instead of 3 values). For 4-bit the effect is less pronounced, but may still be worthwhile.
Your code looks fine, but this should definitely be discussed more in depth before we decide to change this. If we do it, I believe we should update the SIMD implementations at the same time.
The increase in maximum error is probably due to the case where there are two values of similar magnitude but opposite signs. The value closer to zero gets rounded to +8 and clipped to +7. I don't know if this is bad, but if yes, we might use the old method for this case specifically.
Ideally all the SIMD implementations should be updated yes, but who has a PowerPC CPU lying around? :/ I'll have a look at the AVX versions if the perplexity run looks promising.
I think there is value in just updating the reference implementation, since that is the one we use for model quantization, maybe adding a comment to each SIMD implementation that it needs updating.
Thanks to #728, I was able to test my q2 and q3 implementations, and as expected the changes are bigger with fewer bits:
[-7,+7]
q2_0 : rmse 0.01329486, maxerr 0.92089844, 95pct<0.0246, median<0.0098
q3_0 : rmse 0.00453537, maxerr 0.41357422, 95pct<0.0086, median<0.0034
q4_0 : rmse 0.00221840, maxerr 0.14257812, 95pct<0.0040, median<0.0018
q4_1 : rmse 0.00178278, maxerr 0.12756348, 95pct<0.0034, median<0.0014
[-8,+7]
q2_0 : rmse 0.00739169, maxerr 0.87304688, 95pct<0.0140, median<0.0052
q3_0 : rmse 0.00352698, maxerr 0.37866211, 95pct<0.0066, median<0.0026
q4_0 : rmse 0.00196398, maxerr 0.18200684, 95pct<0.0036, median<0.0016
q4_1 : rmse 0.00178278, maxerr 0.12756348, 95pct<0.0034, median<0.0014
Weirdly, the maximum error goes down for q2 and q3.
I think there is value in just updating the reference implementation,
That would be acceptable if changing the SIMD implementations incurs a significant performance degradation. But then we shouldn't really call the function "reference" anymore.
Not going to complete full perplexity run on my laptop, but from a partial run in 7B the difference looks close to 0.1 perplexity
[-7,7] 7B perplexity
[1]4.6779,[2]5.2229,[3]6.1112,[4]6.7492,[5]6.8303,[6]6.8051,[7]6.9926,[8]7.0888,[9]7.5382,[10]7.7807,[11]8.0549,[12]8.1052,[13]8.0297,[14]8.1147,[15]8.3789,[16]7.9522,[17]7.8131,[18]7.7723,[19]7.3747,[20]7.3474,[21]7.2479,[22]7.0641,[23]7.0223,[24]6.9325,[25]6.9334,[26]6.7526,[27]6.5554,[28]6.4452,[29]6.3563,[30]6.1837,[31]6.1520,[32]6.1675,[33]6.0995,[34]6.1298,[35]6.1547,[36]6.2009,[37]6.2067,[38]6.2267,[39]6.2677,[40]6.3210,[41]6.3275,[42]6.3638,[43]6.3210,[44]6.3821,[45]6.3843,[46]6.3575,[47]6.3840,[48]6.3462,[49]6.3515,[50]6.3104,[51]6.3064,[52]6.2917,[53]6.3383,[54]6.3245,[55]6.2972,[56]6.3340,[57]6.3592,[58]6.3860,[59]6.3992,[60]6.4499,[61]6.4402,[62]6.5061,[63]6.5451,[64]6.5606,[65]6.6086,[66]6.6234,[67]6.6408,[68]6.6593,[69]6.6866,[70]6.7170,[71]6.7412,[72]6.7761,[73]6.8410,[74]6.8493,[75]6.8636,[76]6.8816,[77]6.8944,[78]6.8812,[79]6.9099,[80]6.9026,[81]6.9229,[82]6.9298,[83]6.8718,[84]6.8542,[85]6.8452,[86]6.8244,[87]6.7583,[88]6.7281,[89]6.7069,[90]6.6896,[91]6.7199,[92]6.7152,[93]6.7150,[94]6.7155,[95]6.7449,[96]6.7425,[97]6.7352,[98]6.7279,[99]6.7115,[100]6.7110,[101]6.7352,[102]6.7278,[103]6.7526,[104]6.7592,[105]6.7582,[106]6.7751,[107]6.7754,[108]6.7887,[109]6.7824,[110]6.7768,[111]6.8007,[112]6.8201,[113]6.8232,[114]6.8200,[115]6.8297,[116]6.8227,[117]6.8250,[118]6.8542,[119]6.8771,[120]6.9143,[121]6.9306,[122]6.9561,[123]6.9960,[124]7.0153,[125]7.0066,[126]7.0473,[127]7.0840,[128]7.1115,[129]7.0932,[130]7.1037,[131]7.0967,[132]7.0878,[133]7.0753,[134]7.0858,[135]7.0830,[136]7.0689,[137]7.0605,[138]7.0437,[139]7.0311,[140]7.0274,[141]6.9981,[142]6.9940,[143]6.9673,[144]6.9462,[145]6.9366,[146]6.9226,[147]6.9285,[148]6.9296,[149]6.9252,[150]6.9220,[151]6.9242,[152]6.9162,[153]6.8968,[154]6.8879,[155]6.8939,[156]6.8901,[157]6.9075,[158]6.9098,[159]6.9158,[160]6.9202,[161]6.9321,[162]6.8997,[163]6.8858,[164]6.8579,[165]6.8241,[166]6.7931,[167]6.7527,[168]6.7188,[169]6.7050,[170]6.6919,[171]6.6609,[172]6.6415,[173]6.6218,[174]6.5891,[175]6.5668,[176]6.5542,[177]6.5325,[178]6.5070,[179]6.4887,[180]6.4793,[181]6.4552,[182]6.4356,[183]6.4206,[184]6.4206,[185]6.4126,[186]6.4149,[187]6.4207,[188]6.4168,[189]6.4354,[190]6.4378,[191]6.4586,[192]6.4742,[193]6.4921,[194]6.5046,[195]6.5262,[196]6.5439,[197]6.5656,[198]6.5822,[199]6.5856,[200]6.5896,[201]6.5873,[202]6.6088,[203]6.6157,[204]6.6185,[205]6.6296,[206]6.6370,[207]6.6320,[208]6.6411,[209]6.6458,[210]6.6518,[211]6.6635,[212]6.6714,[213]6.6830,[214]6.6895,[215]6.6930,[216]6.7079,[217]6.7273,[218]6.7412,[219]6.7434,[220]6.7394,[221]6.7340,[222]6.7300,[223]6.7180,[224]6.7119,[225]6.7067,[226]6.7286,[227]6.7388,[228]6.7445,[229]6.7522,[230]6.7469,[231]6.7636,[232]6.7500,[233]6.7310,[234]6.7152,[235]6.6997,[236]6.6917,[237]6.6807,[238]6.6849,[239]6.6676,[240]6.6567,[241]6.6611,[242]6.6643,[243]6.6616,[244]6.6494,[245]6.6457,[246]6.6330,[247]6.6199,[248]6.6123,[249]6.6098,[250]6.6141,[251]6.6061,[252]6.6012,[253]6.5908,[254]6.5869,[255]6.5741,[256]6.5540,[257]6.5418,[258]6.5335,[259]6.5311,[260]6.5236,[261]6.5188,[262]6.5125,[263]6.5074,[264]6.4896,[265]6.4889,[266]6.4874,[267]6.4805,
[-8,7] 7B perplexity
[1]4.7621,[2]5.1223,[3]5.9336,[4]6.6342,[5]6.7721,[6]6.7061,[7]6.9259,[8]7.0344,[9]7.4235,[10]7.6897,[11]7.9011,[12]7.9239,[13]7.8762,[14]7.9814,[15]8.2569,[16]7.8565,[17]7.7168,[18]7.6627,[19]7.2805,[20]7.2703,[21]7.1681,[22]7.0116,[23]6.9790,[24]6.8777,[25]6.8803,[26]6.7121,[27]6.5158,[28]6.4070,[29]6.3110,[30]6.1325,[31]6.0981,[32]6.1266,[33]6.0694,[34]6.1058,[35]6.1237,[36]6.1645,[37]6.1699,[38]6.1867,[39]6.2218,[40]6.2833,[41]6.3007,[42]6.3411,[43]6.2993,[44]6.3558,[45]6.3558,[46]6.3233,[47]6.3525,[48]6.3188,[49]6.3266,[50]6.2782,[51]6.2704,[52]6.2552,[53]6.3019,[54]6.2785,[55]6.2537,[56]6.2873,[57]6.3135,[58]6.3343,[59]6.3528,[60]6.4037,[61]6.3955,[62]6.4572,[63]6.4985,[64]6.5170,[65]6.5668,[66]6.5788,[67]6.5950,[68]6.6156,[69]6.6404,[70]6.6754,[71]6.6993,[72]6.7348,[73]6.7984,[74]6.8033,[75]6.8182,[76]6.8319,[77]6.8449,[78]6.8314,[79]6.8620,[80]6.8494,[81]6.8593,[82]6.8623,[83]6.8069,[84]6.7927,[85]6.7788,[86]6.7541,[87]6.6924,[88]6.6632,[89]6.6429,[90]6.6270,[91]6.6500,[92]6.6441,[93]6.6424,[94]6.6393,[95]6.6681,[96]6.6644,[97]6.6589,[98]6.6486,[99]6.6323,[100]6.6296,[101]6.6543,[102]6.6482,[103]6.6712,[104]6.6770,[105]6.6769,[106]6.6916,[107]6.6910,[108]6.7034,[109]6.6984,[110]6.6949,[111]6.7163,[112]6.7380,[113]6.7377,[114]6.7335,[115]6.7394,[116]6.7313,[117]6.7341,[118]6.7633,[119]6.7858,[120]6.8246,[121]6.8410,[122]6.8685,[123]6.9053,[124]6.9259,[125]6.9152,[126]6.9565,[127]6.9950,[128]7.0264,[129]7.0077,[130]7.0189,[131]7.0123,[132]7.0022,[133]6.9870,[134]6.9976,[135]6.9947,[136]6.9812,[137]6.9733,[138]6.9575,[139]6.9476,[140]6.9455,[141]6.9165,[142]6.9134,[143]6.8838,[144]6.8618,[145]6.8531,[146]6.8407,[147]6.8470,[148]6.8458,[149]6.8403,[150]6.8375,[151]6.8394,[152]6.8284,[153]6.8098,[154]6.7999,[155]6.8061,[156]6.8009,[157]6.8192,[158]6.8223,[159]6.8287,[160]6.8308,[161]6.8430,[162]6.8101,[163]6.7953,[164]6.7688,[165]6.7354,[166]6.7053,[167]6.6648,[168]6.6304,[169]6.6158,[170]6.6033,[171]6.5736,[172]6.5552,[173]6.5366,[174]6.5055,[175]6.4826,[176]6.4683,[177]6.4463,[178]6.4224,[179]6.4044,[180]6.3943,[181]6.3699,[182]6.3498,[183]6.3349,[184]6.3346,[185]6.3259,[186]6.3260,[187]6.3319,[188]6.3273,[189]6.3453,[190]6.3471,[191]6.3691,[192]6.3859,[193]6.4041,[194]6.4174,[195]6.4405,[196]6.4571,[197]6.4805,[198]6.4978,[199]6.5040,[200]6.5085,[201]6.5022,[202]6.5229,[203]6.5308,[204]6.5325,[205]6.5439,[206]6.5509,[207]6.5464,[208]6.5553,[209]6.5594,[210]6.5643,[211]6.5741,[212]6.5806,[213]6.5904,[214]6.5938,[215]6.5968,[216]6.6118,[217]6.6301,[218]6.6440,[219]6.6449,[220]6.6409,[221]6.6368,[222]6.6336,[223]6.6226,[224]6.6146,[225]6.6101,[226]6.6313,[227]6.6426,[228]6.6488,[229]6.6540,[230]6.6504,[231]6.6677,[232]6.6545,[233]6.6367,[234]6.6220,[235]6.6056,[236]6.5988,[237]6.5872,[238]6.5899,[239]6.5734,[240]6.5628,[241]6.5663,[242]6.5704,[243]6.5698,[244]6.5570,[245]6.5532,[246]6.5401,[247]6.5271,[248]6.5187,[249]6.5159,[250]6.5199,[251]6.5115,[252]6.5073,[253]6.4970,[254]6.4932,[255]6.4808,[256]6.4626,[257]6.4497,[258]6.4398,[259]6.4370,[260]6.4293,[261]6.4249,[262]6.4193,[263]6.4131,[264]6.3954,[265]6.3952,[266]6.3935,[267]6.3873,
Pushed suggestions for vectorized implementations, but besides AVX and AVX2 they have not been tested. Will need someone to verify them for each architecture.
may need to pump version to 2? force everyone regen the file again?
Ok, this looks promising. Lets test this some more and see if the implementation is correct. I guess no significant change in performance is observed, correct?
@howard0su Yes, if this is shown to work - we will bump the version
Ok, this looks promising. Lets test this some more and see if the implementation is correct. I guess no significant change in performance is observed, correct?
@howard0su Yes, if this is shown to work - we will bump the version
read the change again. The change only applies to quantize to pick a better scale factor. we don't really need to pump version but ask the user to regain is enough.
However if we bump the version we can warn the users using outdated models that they could gain some quality by re-quantizing their models. No need to break backwards compatibility, just bump the version and show a warning if the model is old, but still accept it.
I am currently running the perplexity test on AVX2, so far the results seem similar to what @unbounded reported earlier, about 0.1 lower than before. I will try to complete the test but it is going to take a few hours.
The increase in maximum error is probably due to the case where there are two values of similar magnitude but opposite signs. The value closer to zero gets rounded to +8 and clipped to +7. I don't know if this is bad, but if yes, we might use the old method for this case specifically.
Trying to address the issue I raised (how important is max error though?), I came up with this:
#define THRESHOLD 9
float max = -INFINITY;
float min = +INFINITY;
for (int l = 0; l < QK; l++) {
const float v = x[i*QK + l];
max = MAX(max, v);
min = MIN(min, v);
}
float d;
if (fabsf(max + min) < (max - min) / THRESHOLD) {
d = MAX(max, -min) / 7; // max and min are close in magnitude, use old method
}
else if (max > -min) {
d = max / -8;
}
else {
d = min / -8;
}
Maybe someone can think of a better way of writing this. Anyway, the results look good:
q4_0 : rmse 0.00221840, maxerr 0.14257812, 95pct<0.0040, median<0.0018 (master)
q4_0 : rmse 0.00196398, maxerr 0.18200684, 95pct<0.0036, median<0.0016 (q4_0_range_fix)
q4_0 : rmse 0.00200958, maxerr 0.14257812, 95pct<0.0038, median<0.0016 (THRESHOLD= 16)
q4_0 : rmse 0.00197354, maxerr 0.12475586, 95pct<0.0036, median<0.0016 (THRESHOLD= 32)
q4_0 : rmse 0.00196311, maxerr 0.12475586, 95pct<0.0036, median<0.0016 (THRESHOLD= 64)
q4_0 : rmse 0.00196210, maxerr 0.15563965, 95pct<0.0036, median<0.0016 (THRESHOLD= 96)
q4_0 : rmse 0.00196209, maxerr 0.17016602, 95pct<0.0036, median<0.0016 (THRESHOLD=128)
I know that I was originally in favor of updating the SIMD to match this, but it's getting more complicated. Maybe we have to abandon the idea of it being the "reference" and just the "slow but low error" method.
Very interesting experiment @sw - hard to know how much the metrics correspond to perplexity, but we could test the one with lowest RMS and see if it improves perplexity.
I don't think we need to bump the version for pure quantization improvements, it's a shame to break compatibility when there's no actual change to the format. We might see multiple different ways to quantize later, e.g. error-minimizing or GPTQ-like methods and I don't think we want a breaking change for each one.
It would be be nice to know the version that generated the model, e.g. for bug reports you could know if they used a known buggy version, but there's no unused space in the header to put it. Maybe next time the format changes we could reserve some extra space to add metadata like "quantization method" which does not change the format.
Instead of playing with a threshold heuristic, would it be very costly to calculate the rms of the two quantization methods and choose the lower? Only in the reference implementation for converting the model of course.
Perplexity with 7B/AVX2: 6.4481 Seems like a nice improvement compared to the current 6.5949 referenced here
Full output
./perplexity -m ./models/7B/ggml-model-q4_0-pr729.bin -f wikitext-2-raw/wiki.test.raw -t 12
main: seed = 1680707432
llama_model_load: loading model from './models/7B/ggml-model-q4_0-pr729.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx = 512
llama_model_load: n_embd = 4096
llama_model_load: n_mult = 256
llama_model_load: n_head = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot = 128
llama_model_load: f16 = 2
llama_model_load: n_ff = 11008
llama_model_load: n_parts = 1
llama_model_load: type = 1
llama_model_load: ggml map size = 4017.70 MB
llama_model_load: ggml ctx size = 81.25 KB
llama_model_load: mem required = 5809.78 MB (+ 1026.00 MB per state)
llama_model_load: loading tensors from './models/7B/ggml-model-q4_0-pr729.bin'
llama_model_load: model size = 4017.27 MB / num tensors = 291
llama_init_from_file: kv self size = 256.00 MB
system_info: n_threads = 12 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
perplexity : calculating perplexity over 655 chunks
32.82 seconds per pass - ETA 5.97 hours
[1]4.7864,[2]5.1923,[3]6.0580,[4]6.6469,[5]6.7840,[6]6.6861,[7]6.8975,[8]7.0032,[9]7.3668,[10]7.6322,[11]7.8826,[12]7.8969,[13]7.8458,[14]7.9136,[15]8.1790,[16]7.7510,[17]7.6185,[18]7.5804,[19]7.1999,[20]7.1923,[21]7.0955,[22]6.9284,[23]6.9032,[24]6.8079,[25]6.8055,[26]6.6372,[27]6.4405,[28]6.3437,[29]6.2436,[30]6.0763,[31]6.0417,[32]6.0649,[33]6.0080,[34]6.0413,[35]6.0561,[36]6.1001,[37]6.1098,[38]6.1198,[39]6.1548,[40]6.2143,[41]6.2246,[42]6.2736,[43]6.2295,[44]6.2881,[45]6.2919,[46]6.2637,[47]6.2876,[48]6.2559,[49]6.2656,[50]6.2187,[51]6.2115,[52]6.1977,[53]6.2443,[54]6.2268,[55]6.2017,[56]6.2385,[57]6.2608,[58]6.2831,[59]6.2997,[60]6.3468,[61]6.3371,[62]6.3958,[63]6.4326,[64]6.4467,[65]6.4979,[66]6.5092,[67]6.5236,[68]6.5424,[69]6.5677,[70]6.6007,[71]6.6227,[72]6.6560,[73]6.7181,[74]6.7241,[75]6.7409,[76]6.7538,[77]6.7676,[78]6.7559,[79]6.7869,[80]6.7788,[81]6.7907,[82]6.7976,[83]6.7431,[84]6.7283,[85]6.7173,[86]6.6938,[87]6.6322,[88]6.6030,[89]6.5835,[90]6.5689,[91]6.5954,[92]6.5896,[93]6.5869,[94]6.5823,[95]6.6139,[96]6.6106,[97]6.6091,[98]6.5988,[99]6.5836,[100]6.5833,[101]6.6076,[102]6.5991,[103]6.6217,[104]6.6300,[105]6.6317,[106]6.6485,[107]6.6473,[108]6.6607,[109]6.6539,[110]6.6510,[111]6.6715,[112]6.6928,[113]6.6929,[114]6.6905,[115]6.6967,[116]6.6870,[117]6.6893,[118]6.7186,[119]6.7401,[120]6.7758,[121]6.7935,[122]6.8201,[123]6.8598,[124]6.8791,[125]6.8693,[126]6.9105,[127]6.9484,[128]6.9800,[129]6.9619,[130]6.9731,[131]6.9686,[132]6.9605,[133]6.9473,[134]6.9554,[135]6.9514,[136]6.9384,[137]6.9309,[138]6.9133,[139]6.9017,[140]6.8982,[141]6.8700,[142]6.8654,[143]6.8376,[144]6.8182,[145]6.8086,[146]6.7943,[147]6.8015,[148]6.8003,[149]6.7961,[150]6.7935,[151]6.7949,[152]6.7843,[153]6.7655,[154]6.7557,[155]6.7634,[156]6.7578,[157]6.7752,[158]6.7796,[159]6.7857,[160]6.7876,[161]6.7992,[162]6.7671,[163]6.7544,[164]6.7283,[165]6.6938,[166]6.6656,[167]6.6252,[168]6.5919,[169]6.5780,[170]6.5659,[171]6.5364,[172]6.5185,[173]6.4986,[174]6.4678,[175]6.4441,[176]6.4307,[177]6.4089,[178]6.3837,[179]6.3643,[180]6.3542,[181]6.3300,[182]6.3114,[183]6.2967,[184]6.2949,[185]6.2866,[186]6.2864,[187]6.2926,[188]6.2882,[189]6.3072,[190]6.3082,[191]6.3298,[192]6.3463,[193]6.3641,[194]6.3763,[195]6.3995,[196]6.4158,[197]6.4386,[198]6.4544,[199]6.4593,[200]6.4635,[201]6.4582,[202]6.4773,[203]6.4853,[204]6.4869,[205]6.4983,[206]6.5054,[207]6.5009,[208]6.5083,[209]6.5134,[210]6.5201,[211]6.5297,[212]6.5373,[213]6.5473,[214]6.5521,[215]6.5549,[216]6.5699,[217]6.5871,[218]6.6010,[219]6.6018,[220]6.5983,[221]6.5927,[222]6.5886,[223]6.5784,[224]6.5693,[225]6.5664,[226]6.5862,[227]6.5963,[228]6.6012,[229]6.6068,[230]6.6040,[231]6.6209,[232]6.6072,[233]6.5893,[234]6.5745,[235]6.5585,[236]6.5513,[237]6.5402,[238]6.5431,[239]6.5261,[240]6.5158,[241]6.5197,[242]6.5242,[243]6.5221,[244]6.5093,[245]6.5058,[246]6.4932,[247]6.4804,[248]6.4724,[249]6.4701,[250]6.4743,[251]6.4664,[252]6.4614,[253]6.4511,[254]6.4475,[255]6.4352,[256]6.4168,[257]6.4043,[258]6.3957,[259]6.3927,[260]6.3839,[261]6.3798,[262]6.3747,[263]6.3687,[264]6.3524,[265]6.3526,[266]6.3509,[267]6.3436,[268]6.3531,[269]6.3514,[270]6.3525,[271]6.3599,[272]6.3643,[273]6.3639,[274]6.3664,[275]6.3757,[276]6.3818,[277]6.3979,[278]6.4088,[279]6.4187,[280]6.4214,[281]6.4304,[282]6.4359,[283]6.4511,[284]6.4589,[285]6.4674,[286]6.4803,[287]6.4805,[288]6.4873,[289]6.4782,[290]6.4615,[291]6.4460,[292]6.4296,[293]6.4148,[294]6.4160,[295]6.4146,[296]6.4193,[297]6.4183,[298]6.4214,[299]6.4190,[300]6.4079,[301]6.4078,[302]6.3994,[303]6.3908,[304]6.3823,[305]6.3786,[306]6.3661,[307]6.3688,[308]6.3727,[309]6.3558,[310]6.3495,[311]6.3430,[312]6.3456,[313]6.3399,[314]6.3381,[315]6.3209,[316]6.3168,[317]6.3004,[318]6.2785,[319]6.2912,[320]6.3036,[321]6.3086,[322]6.3036,[323]6.2959,[324]6.2934,[325]6.3032,[326]6.3033,[327]6.3058,[328]6.3098,[329]6.3166,[330]6.3189,[331]6.3315,[332]6.3282,[333]6.3361,[334]6.3301,[335]6.3233,[336]6.3271,[337]6.3234,[338]6.3241,[339]6.3186,[340]6.3149,[341]6.3223,[342]6.3244,[343]6.3296,[344]6.3298,[345]6.3301,[346]6.3270,[347]6.3307,[348]6.3340,[349]6.3361,[350]6.3323,[351]6.3327,[352]6.3332,[353]6.3271,[354]6.3285,[355]6.3340,[356]6.3367,[357]6.3331,[358]6.3424,[359]6.3458,[360]6.3418,[361]6.3419,[362]6.3488,[363]6.3606,[364]6.3671,[365]6.3729,[366]6.3745,[367]6.3830,[368]6.3803,[369]6.3814,[370]6.3822,[371]6.3761,[372]6.3816,[373]6.3865,[374]6.3848,[375]6.3840,[376]6.3916,[377]6.3862,[378]6.3894,[379]6.3958,[380]6.3876,[381]6.3837,[382]6.3779,[383]6.3767,[384]6.3758,[385]6.3751,[386]6.3748,[387]6.3741,[388]6.3700,[389]6.3642,[390]6.3572,[391]6.3502,[392]6.3467,[393]6.3451,[394]6.3477,[395]6.3460,[396]6.3385,[397]6.3456,[398]6.3491,[399]6.3575,[400]6.3574,[401]6.3591,[402]6.3600,[403]6.3613,[404]6.3679,[405]6.3580,[406]6.3542,[407]6.3539,[408]6.3553,[409]6.3678,[410]6.3794,[411]6.3919,[412]6.4083,[413]6.4201,[414]6.4275,[415]6.4334,[416]6.4413,[417]6.4537,[418]6.4571,[419]6.4641,[420]6.4725,[421]6.4843,[422]6.4889,[423]6.4963,[424]6.5080,[425]6.5169,[426]6.5243,[427]6.5288,[428]6.5382,[429]6.5432,[430]6.5519,[431]6.5662,[432]6.5698,[433]6.5687,[434]6.5637,[435]6.5650,[436]6.5672,[437]6.5777,[438]6.5852,[439]6.5823,[440]6.5808,[441]6.5754,[442]6.5738,[443]6.5748,[444]6.5750,[445]6.5733,[446]6.5760,[447]6.5794,[448]6.5835,[449]6.5805,[450]6.5814,[451]6.5765,[452]6.5657,[453]6.5567,[454]6.5506,[455]6.5516,[456]6.5563,[457]6.5584,[458]6.5562,[459]6.5563,[460]6.5652,[461]6.5622,[462]6.5607,[463]6.5654,[464]6.5648,[465]6.5622,[466]6.5544,[467]6.5550,[468]6.5553,[469]6.5573,[470]6.5581,[471]6.5534,[472]6.5583,[473]6.5522,[474]6.5540,[475]6.5487,[476]6.5517,[477]6.5446,[478]6.5433,[479]6.5507,[480]6.5553,[481]6.5573,[482]6.5526,[483]6.5491,[484]6.5516,[485]6.5499,[486]6.5441,[487]6.5439,[488]6.5420,[489]6.5366,[490]6.5339,[491]6.5312,[492]6.5251,[493]6.5220,[494]6.5201,[495]6.5205,[496]6.5166,[497]6.5107,[498]6.5084,[499]6.5032,[500]6.4932,[501]6.4861,[502]6.4866,[503]6.4859,[504]6.4769,[505]6.4795,[506]6.4800,[507]6.4761,[508]6.4717,[509]6.4709,[510]6.4747,[511]6.4799,[512]6.4833,[513]6.4854,[514]6.4921,[515]6.4866,[516]6.4859,[517]6.4871,[518]6.4870,[519]6.4906,[520]6.4929,[521]6.4948,[522]6.4979,[523]6.4984,[524]6.5040,[525]6.5075,[526]6.5093,[527]6.5110,[528]6.5059,[529]6.5067,[530]6.5012,[531]6.4996,[532]6.5046,[533]6.5069,[534]6.5047,[535]6.5067,[536]6.5015,[537]6.4990,[538]6.5039,[539]6.5048,[540]6.5087,[541]6.5098,[542]6.5113,[543]6.5129,[544]6.5139,[545]6.5119,[546]6.5122,[547]6.5077,[548]6.5020,[549]6.5020,[550]6.4988,[551]6.4946,[552]6.4925,[553]6.4881,[554]6.4857,[555]6.4827,[556]6.4823,[557]6.4850,[558]6.4810,[559]6.4804,[560]6.4804,[561]6.4803,[562]6.4772,[563]6.4771,[564]6.4814,[565]6.4831,[566]6.4826,[567]6.4806,[568]6.4811,[569]6.4791,[570]6.4816,[571]6.4821,[572]6.4827,[573]6.4830,[574]6.4791,[575]6.4789,[576]6.4793,[577]6.4774,[578]6.4758,[579]6.4764,[580]6.4694,[581]6.4650,[582]6.4638,[583]6.4645,[584]6.4649,[585]6.4571,[586]6.4499,[587]6.4503,[588]6.4552,[589]6.4605,[590]6.4636,[591]6.4659,[592]6.4642,[593]6.4608,[594]6.4616,[595]6.4593,[596]6.4634,[597]6.4608,[598]6.4580,[599]6.4601,[600]6.4598,[601]6.4588,[602]6.4609,[603]6.4640,[604]6.4654,[605]6.4690,[606]6.4710,[607]6.4699,[608]6.4657,[609]6.4661,[610]6.4699,[611]6.4679,[612]6.4707,[613]6.4672,[614]6.4618,[615]6.4540,[616]6.4571,[617]6.4506,[618]6.4448,[619]6.4387,[620]6.4240,[621]6.4167,[622]6.4150,[623]6.4171,[624]6.4178,[625]6.4182,[626]6.4169,[627]6.4190,[628]6.4192,[629]6.4185,[630]6.4216,[631]6.4276,[632]6.4330,[633]6.4313,[634]6.4347,[635]6.4353,[636]6.4327,[637]6.4292,[638]6.4320,[639]6.4288,[640]6.4301,[641]6.4306,[642]6.4376,[643]6.4399,[644]6.4407,[645]6.4390,[646]6.4431,[647]6.4398,[648]6.4409,[649]6.4410,[650]6.4451,[651]6.4506,[652]6.4518,[653]6.4559,[654]6.4489,[655]6.4481,
I don't think we need to bump the version for pure quantization improvements, it's a shame to break compatibility when there's no actual change to the format.
The idea would be to keep compatibility, but show a warning to let the user know that it is possible to requantize the model for some gain.
I will have a look at dynamically selecting 7 or 8 (or even a value inbetween) according to RMSE, but should we just ignore the maximum error?
Essentially this is what #397 was about. There are some prototypes there, just need to select something that's not too bad in terms of performance.
This would then not be a reference or fallback implementation, but we can keep the SIMD implementations as they are and provide a matching non-SIMD one (apart from the known rounding/associativity issues with floats).
I'm not too keen on touching the version, as it would prevent older revisions from using newly generated files. Would it be acceptable to scan through part of some tensors to see if the value -8 is ever used, and print a warning if not? (during loading in llama.cpp of course, not in ggml.c) It's a bit of a hack and may affect mmap/cache behavior.
Or just use a padding space as a minor version.
I will have a look at dynamically selecting 7 or 8 (or even a value inbetween) according to RMSE, but should we just ignore the maximum error?
Essentially this is what #397 was about. There are some prototypes there, just need to select something that's not too bad in terms of performance.
This would then not be a reference or fallback implementation, but we can keep the SIMD implementations as they are and provide a matching non-SIMD one (apart from the known rounding/associativity issues with floats).
I'm not too keen on touching the version, as it would prevent older revisions from using newly generated files. Would it be acceptable to scan through part of some tensors to see if the value -8 is ever used, and print a warning if not? (during loading in llama.cpp of course, not in ggml.c) It's a bit of a hack and may affect mmap/cache behavior.
Or just use a padding space as a minor version.
I believe we were discussing that we pump the version to 2 but we load version 2 as version 1 except to tell the user if you have original file, please re-run quantize gain a better result.
Keep in mind, before and if this is merged, we have to coordinate efforts with #835 and #896
@unbounded @sw
Let's apply this approach to the Q4_2
method without RMSE optimization and compare the perplexity result against Q4_2
with RMSE (i.e. the one that is enabled by default on master
, last reported as 6.2038
for 7B). We should do this before merging #1106
I'll try to figure out the change and do it for ARM NEON
Using Q4_2
without RMSE optimization (and also without the oprimization from this branch) 6.2226
:
cat ppl-q4_2-no-rmse.txt
main: seed = 1682163830
llama.cpp: loading model from ../models/7B/ggml-model-q4_2-no-rmse.bin
llama_model_load_internal: format = ggjt v1 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 512
llama_model_load_internal: n_embd = 4096
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 32
llama_model_load_internal: n_layer = 32
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 5 (mostly Q4_2)
llama_model_load_internal: n_ff = 11008
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 4113739.11 KB
llama_model_load_internal: mem required = 5809.32 MB (+ 1026.00 MB per state)
....................................................................................................
llama_init_from_file: kv self size = 256.00 MB
system_info: n_threads = 12 / 32 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 |
NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
perplexity : calculating perplexity over 655 chunks, batch_size=512
1.74 seconds per pass - ETA 19 minutes
[1]4.3735,[2]4.9293,[3]5.7896,[4]6.4077,[5]6.4991,[6]6.4768,[7]6.6610,[8]6.7477,[9]7.1219,[10]7.3704,[11]7.5920,[12]7.6235,[13]7.5463,[14]7.6191,[15]7.8629,[16]7.4640,[17]7.3420,[18]7.2945,[19]6.9242,[20]6.9127,[21]6.8089,[22]6.6299,[23]6.5925,[24]6.4997,[25]6.5029,[26]6.3341,[27]6.1599,[28]6.0600,[29]5.9736,[30]5.8153,[31]5.7828,[32]5.8033,[33]5.7465,[34]5.7830,[35]5.8114,[36]5.8524,[37]5.8535,[38]5.8713,[39]5.9074,[40]5.9653,[41]5.9774,[42]6.0155,[43]5.9764,[44]6.0300,[45]6.0355,[46]6.0079,[47]6.0289,[48]6.0008,[49]6.0048,[50]5.9646,[51]5.9617,[52]5.9532,[53]5.9963,[54]5.9796,[55]5.9570,[56]5.9921,[57]6.0160,[58]6.0408,[59]6.0566,[60]6.1009,[61]6.0944,[62]6.1547,[63]6.1878,[64]6.2049,[65]6.2511,[66]6.2605,[67]6.2788,[68]6.2949,[69]6.3208,[70]6.3540,[71]6.3749,[72]6.4052,[73]6.4668,[74]6.4694,[75]6.4832,[76]6.4988,[77]6.5140,[78]6.5002,[79]6.5291,[80]6.5213,[81]6.5319,[82]6.5371,[83]6.4828,[84]6.4654,[85]6.4531,[86]6.4312,[87]6.3655,[88]6.3374,[89]6.3173,[90]6.3025,[91]6.3266,[92]6.3237,[93]6.3270,[94]6.3241,[95]6.3520,[96]6.3519,[97]6.3468,[98]6.3427,[99]6.3302,[100]6.3316,[101]6.3577,[102]6.3512,[103]6.3723,[104]6.3796,[105]6.3782,[106]6.3957,[107]6.3945,[108]6.4077,[109]6.4027,[110]6.3993,[111]6.4223,[112]6.4419,[113]6.4444,[114]6.4413,[115]6.4493,[116]6.4418,[117]6.4470,[118]6.4762,[119]6.4978,[120]6.5344,[121]6.5508,[122]6.5752,[123]6.6131,[124]6.6300,[125]6.6203,[126]6.6582,[127]6.6944,[128]6.7231,[129]6.7055,[130]6.7160,[131]6.7120,[132]6.7030,[133]6.6899,[134]6.7009,[135]6.6962,[136]6.6838,[137]6.6760,[138]6.6584,[139]6.6469,[140]6.6427,[141]6.6124,[142]6.6075,[143]6.5789,[144]6.5586,[145]6.5483,[146]6.5358,[147]6.5427,[148]6.5439,[149]6.5381,[150]6.5331,[151]6.5356,[152]6.5266,[153]6.5098,[154]6.5010,[155]6.5077,[156]6.5025,[157]6.5204,[158]6.5251,[159]6.5288,[160]6.5316,[161]6.5432,[162]6.5134,[163]6.5009,[164]6.4761,[165]6.4448,[166]6.4160,[167]6.3779,[168]6.3460,[169]6.3326,[170]6.3213,[171]6.2930,[172]6.2745,[173]6.2558,[174]6.2247,[175]6.2034,[176]6.1931,[177]6.1722,[178]6.1487,[179]6.1319,[180]6.1220,[181]6.0998,[182]6.0818,[183]6.0679,[184]6.0680,[185]6.0606,[186]6.0615,[187]6.0673,[188]6.0635,[189]6.0815,[190]6.0823,[191]6.1033,[192]6.1202,[193]6.1376,[194]6.1490,[195]6.1703,[196]6.1860,[197]6.2077,[198]6.2230,[199]6.2259,[200]6.2305,[201]6.2261,[202]6.2458,[203]6.2531,[204]6.2511,[205]6.2619,[206]6.2689,[207]6.2648,[208]6.2741,[209]6.2786,[210]6.2840,[211]6.2934,[212]6.3010,[213]6.3117,[214]6.3151,[215]6.3186,[216]6.3337,[217]6.3512,[218]6.3645,[219]6.3638,[220]6.3605,[221]6.3548,[222]6.3517,[223]6.3407,[224]6.3345,[225]6.3304,[226]6.3515,[227]6.3602,[228]6.3650,[229]6.3716,[230]6.3678,[231]6.3844,[232]6.3721,[233]6.3549,[234]6.3400,[235]6.3226,[236]6.3149,[237]6.3048,[238]6.3080,[239]6.2924,[240]6.2824,[241]6.2855,[242]6.2893,[243]6.2877,[244]6.2760,[245]6.2733,[246]6.2618,[247]6.2496,[248]6.2424,[249]6.2405,[250]6.2455,[251]6.2378,[252]6.2342,[253]6.2242,[254]6.2201,[255]6.2084,[256]6.1902,[257]6.1789,[258]6.1703,[259]6.1688,[260]6.1613,[261]6.1573,[262]6.1515,[263]6.1466,[264]6.1258,[265]6.1256,[266]6.1247,[267]6.1179,[268]6.1275,[269]6.1254,[270]6.1263,[271]6.1338,[272]6.1379,[273]6.1380,[274]6.1395,[275]6.1485,[276]6.1538,[277]6.1698,[278]6.1813,[279]6.1901,[280]6.1935,[281]6.2029,[282]6.2094,[283]6.2245,[284]6.2319,[285]6.2411,[286]6.2547,[287]6.2539,[288]6.2599,[289]6.2504,[290]6.2349,[291]6.2195,[292]6.2040,[293]6.1908,[294]6.1924,[295]6.1921,[296]6.1967,[297]6.1950,[298]6.1984,[299]6.1954,[300]6.1845,[301]6.1847,[302]6.1772,[303]6.1690,[304]6.1611,[305]6.1583,[306]6.1456,[307]6.1484,[308]6.1524,[309]6.1360,[310]6.1301,[311]6.1235,[312]6.1259,[313]6.1207,[314]6.1193,[315]6.1028,[316]6.0980,[317]6.0816,[318]6.0602,[319]6.0726,[320]6.0854,[321]6.0895,[322]6.0851,[323]6.0785,[324]6.0760,[325]6.0863,[326]6.0864,[327]6.0883,[328]6.0925,[329]6.0981,[330]6.1007,[331]6.1136,[332]6.1102,[333]6.1168,[334]6.1109,[335]6.1046,[336]6.1083,[337]6.1056,[338]6.1046,[339]6.0993,[340]6.0952,[341]6.1033,[342]6.1060,[343]6.1114,[344]6.1113,[345]6.1111,[346]6.1082,[347]6.1131,[348]6.1168,[349]6.1185,[350]6.1151,[351]6.1159,[352]6.1164,[353]6.1109,[354]6.1110,[355]6.1162,[356]6.1190,[357]6.1156,[358]6.1249,[359]6.1281,[360]6.1241,[361]6.1234,[362]6.1299,[363]6.1413,[364]6.1477,[365]6.1535,[366]6.1546,[367]6.1634,[368]6.1609,[369]6.1618,[370]6.1631,[371]6.1571,[372]6.1619,[373]6.1674,[374]6.1660,[375]6.1656,[376]6.1732,[377]6.1681,[378]6.1706,[379]6.1763,[380]6.1682,[381]6.1639,[382]6.1586,[383]6.1577,[384]6.1569,[385]6.1555,[386]6.1551,[387]6.1546,[388]6.1502,[389]6.1448,[390]6.1376,[391]6.1298,[392]6.1257,[393]6.1237,[394]6.1261,[395]6.1246,[396]6.1169,[397]6.1244,[398]6.1281,[399]6.1361,[400]6.1355,[401]6.1371,[402]6.1378,[403]6.1397,[404]6.1460,[405]6.1365,[406]6.1332,[407]6.1328,[408]6.1341,[409]6.1461,[410]6.1572,[411]6.1689,[412]6.1848,[413]6.1962,[414]6.2038,[415]6.2092,[416]6.2171,[417]6.2297,[418]6.2333,[419]6.2403,[420]6.2492,[421]6.2611,[422]6.2663,[423]6.2731,[424]6.2847,[425]6.2938,[426]6.3005,[427]6.3050,[428]6.3134,[429]6.3184,[430]6.3273,[431]6.3417,[432]6.3457,[433]6.3447,[434]6.3399,[435]6.3407,[436]6.3430,[437]6.3524,[438]6.3601,[439]6.3568,[440]6.3560,[441]6.3508,[442]6.3496,[443]6.3510,[444]6.3514,[445]6.3494,[446]6.3518,[447]6.3547,[448]6.3594,[449]6.3571,[450]6.3576,[451]6.3532,[452]6.3410,[453]6.3323,[454]6.3263,[455]6.3272,[456]6.3322,[457]6.3342,[458]6.3322,[459]6.3329,[460]6.3415,[461]6.3388,[462]6.3375,[463]6.3420,[464]6.3411,[465]6.3382,[466]6.3303,[467]6.3306,[468]6.3303,[469]6.3327,[470]6.3333,[471]6.3286,[472]6.3332,[473]6.3276,[474]6.3290,[475]6.3230,[476]6.3247,[477]6.3172,[478]6.3161,[479]6.3225,[480]6.3275,[481]6.3297,[482]6.3252,[483]6.3211,[484]6.3232,[485]6.3210,[486]6.3155,[487]6.3155,[488]6.3134,[489]6.3086,[490]6.3061,[491]6.3032,[492]6.2975,[493]6.2944,[494]6.2925,[495]6.2923,[496]6.2886,[497]6.2830,[498]6.2812,[499]6.2764,[500]6.2668,[501]6.2601,[502]6.2602,[503]6.2597,[504]6.2504,[505]6.2528,[506]6.2537,[507]6.2478,[508]6.2436,[509]6.2427,[510]6.2464,[511]6.2509,[512]6.2548,[513]6.2568,[514]6.2632,[515]6.2577,[516]6.2567,[517]6.2577,[518]6.2579,[519]6.2607,[520]6.2634,[521]6.2649,[522]6.2678,[523]6.2687,[524]6.2744,[525]6.2780,[526]6.2791,[527]6.2813,[528]6.2763,[529]6.2768,[530]6.2719,[531]6.2706,[532]6.2757,[533]6.2779,[534]6.2764,[535]6.2789,[536]6.2736,[537]6.2712,[538]6.2760,[539]6.2768,[540]6.2809,[541]6.2816,[542]6.2826,[543]6.2839,[544]6.2852,[545]6.2830,[546]6.2838,[547]6.2794,[548]6.2742,[549]6.2742,[550]6.2714,[551]6.2677,[552]6.2659,[553]6.2617,[554]6.2592,[555]6.2566,[556]6.2561,[557]6.2582,[558]6.2542,[559]6.2539,[560]6.2535,[561]6.2537,[562]6.2516,[563]6.2518,[564]6.2564,[565]6.2583,[566]6.2580,[567]6.2558,[568]6.2564,[569]6.2546,[570]6.2573,[571]6.2577,[572]6.2588,[573]6.2588,[574]6.2554,[575]6.2549,[576]6.2550,[577]6.2538,[578]6.2518,[579]6.2527,[580]6.2459,[581]6.2421,[582]6.2410,[583]6.2417,[584]6.2421,[585]6.2346,[586]6.2278,[587]6.2281,[588]6.2332,[589]6.2388,[590]6.2419,[591]6.2440,[592]6.2424,[593]6.2390,[594]6.2400,[595]6.2376,[596]6.2411,[597]6.2387,[598]6.2354,[599]6.2374,[600]6.2371,[601]6.2355,[602]6.2370,[603]6.2400,[604]6.2410,[605]6.2444,[606]6.2463,[607]6.2445,[608]6.2411,[609]6.2418,[610]6.2453,[611]6.2432,[612]6.2457,[613]6.2421,[614]6.2371,[615]6.2295,[616]6.2324,[617]6.2260,[618]6.2205,[619]6.2150,[620]6.2007,[621]6.1933,[622]6.1915,[623]6.1931,[624]6.1934,[625]6.1935,[626]6.1921,[627]6.1941,[628]6.1943,[629]6.1939,[630]6.1974,[631]6.2034,[632]6.2088,[633]6.2072,[634]6.2103,[635]6.2109,[636]6.2081,[637]6.2048,[638]6.2076,[639]6.2045,[640]6.2055,[641]6.2057,[642]6.2125,[643]6.2146,[644]6.2156,[645]6.2136,[646]6.2177,[647]6.2140,[648]6.2149,[649]6.2150,[650]6.2186,[651]6.2243,[652]6.2252,[653]6.2295,[654]6.2231,[655]6.2226,
llama_print_timings: load time = 3645.86 ms
llama_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: prompt eval time = 1072909.33 ms / 335360 tokens ( 3.20 ms per token)
llama_print_timings: eval time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: total time = 1104558.69 ms
real 18m24,753s
user 214m21,276s
sys 0m14,005s
Q4_2
with RMSE optimization: 6.2038
Q4_2
without RMSE optimization, with optimization in this branch: TBD
I have rebased this branch on latest master
, updated Q4_2
quantization and will now running the following runs with 7B:
Method | RMSE | Full Range |
---|---|---|
Q4_0 |
6.2681 |
6.2103 |
Q4_2 |
6.2027 |
6.1698 |
Also, is there an extension of this approach to Q4_1
and Q4_3
?
Q4_0 with "RMSE optimized"
main: seed = 1682179032
llama.cpp: loading model from ../models/7B/ggml-model-q4_0-rmse.bin
llama_model_load_internal: format = ggjt v1 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 512
llama_model_load_internal: n_embd = 4096
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 32
llama_model_load_internal: n_layer = 32
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 2 (mostly Q4_0)
llama_model_load_internal: n_ff = 11008
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 4113739.11 KB
llama_model_load_internal: mem required = 5809.32 MB (+ 1026.00 MB per state)
....................................................................................................
llama_init_from_file: kv self size = 256.00 MB
system_info: n_threads = 12 / 32 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
perplexity : calculating perplexity over 655 chunks, batch_size=512
1.68 seconds per pass - ETA 18 minutes
[1]4.4719,[2]5.0037,[3]5.8738,[4]6.4850,[5]6.5898,[6]6.5639,[7]6.7538,[8]6.8514,[9]7.1961,[10]7.4444,[11]7.6858,[12]7.7115,[13]7.6405,[14]7.7258,[15]7.9743,[16]7.5710,[17]7.4437,[18]7.3939,[19]7.0227,[20]7.0154,[21]6.9210,[22]6.7521,[23]6.7172,[24]6.6156,[25]6.6180,[26]6.4465,[27]6.2587,[28]6.1590,[29]6.0640,[30]5.9061,[31]5.8690,[32]5.8913,[33]5.8322,[34]5.8669,[35]5.8914,[36]5.9338,[37]5.9332,[38]5.9492,[39]5.9837,[40]6.0484,[41]6.0599,[42]6.1015,[43]6.0565,[44]6.1130,[45]6.1174,[46]6.0914,[47]6.1139,[48]6.0866,[49]6.0908,[50]6.0488,[51]6.0418,[52]6.0292,[53]6.0742,[54]6.0571,[55]6.0320,[56]6.0635,[57]6.0831,[58]6.1039,[59]6.1227,[60]6.1674,[61]6.1579,[62]6.2171,[63]6.2541,[64]6.2680,[65]6.3145,[66]6.3229,[67]6.3413,[68]6.3545,[69]6.3824,[70]6.4167,[71]6.4379,[72]6.4695,[73]6.5330,[74]6.5385,[75]6.5543,[76]6.5641,[77]6.5740,[78]6.5597,[79]6.5884,[80]6.5809,[81]6.5897,[82]6.5936,[83]6.5380,[84]6.5224,[85]6.5104,[86]6.4885,[87]6.4250,[88]6.3956,[89]6.3772,[90]6.3626,[91]6.3855,[92]6.3807,[93]6.3819,[94]6.3775,[95]6.4066,[96]6.4042,[97]6.3994,[98]6.3906,[99]6.3752,[100]6.3762,[101]6.4027,[102]6.3960,[103]6.4164,[104]6.4237,[105]6.4241,[106]6.4402,[107]6.4388,[108]6.4496,[109]6.4429,[110]6.4395,[111]6.4620,[112]6.4826,[113]6.4863,[114]6.4824,[115]6.4901,[116]6.4808,[117]6.4856,[118]6.5151,[119]6.5367,[120]6.5739,[121]6.5901,[122]6.6148,[123]6.6517,[124]6.6700,[125]6.6598,[126]6.7015,[127]6.7395,[128]6.7700,[129]6.7544,[130]6.7654,[131]6.7603,[132]6.7516,[133]6.7386,[134]6.7498,[135]6.7456,[136]6.7335,[137]6.7254,[138]6.7091,[139]6.6976,[140]6.6931,[141]6.6624,[142]6.6580,[143]6.6283,[144]6.6077,[145]6.6001,[146]6.5871,[147]6.5936,[148]6.5947,[149]6.5888,[150]6.5842,[151]6.5857,[152]6.5739,[153]6.5568,[154]6.5481,[155]6.5551,[156]6.5501,[157]6.5686,[158]6.5719,[159]6.5767,[160]6.5781,[161]6.5908,[162]6.5609,[163]6.5481,[164]6.5228,[165]6.4909,[166]6.4621,[167]6.4240,[168]6.3920,[169]6.3783,[170]6.3673,[171]6.3383,[172]6.3201,[173]6.3016,[174]6.2710,[175]6.2484,[176]6.2376,[177]6.2167,[178]6.1941,[179]6.1764,[180]6.1667,[181]6.1449,[182]6.1273,[183]6.1131,[184]6.1132,[185]6.1051,[186]6.1058,[187]6.1118,[188]6.1078,[189]6.1248,[190]6.1262,[191]6.1486,[192]6.1654,[193]6.1827,[194]6.1942,[195]6.2159,[196]6.2319,[197]6.2537,[198]6.2691,[199]6.2735,[200]6.2785,[201]6.2738,[202]6.2942,[203]6.3026,[204]6.3015,[205]6.3131,[206]6.3208,[207]6.3176,[208]6.3256,[209]6.3296,[210]6.3346,[211]6.3446,[212]6.3518,[213]6.3622,[214]6.3646,[215]6.3683,[216]6.3832,[217]6.4017,[218]6.4161,[219]6.4168,[220]6.4130,[221]6.4078,[222]6.4050,[223]6.3945,[224]6.3871,[225]6.3821,[226]6.4028,[227]6.4112,[228]6.4161,[229]6.4209,[230]6.4171,[231]6.4335,[232]6.4210,[233]6.4040,[234]6.3894,[235]6.3727,[236]6.3654,[237]6.3552,[238]6.3587,[239]6.3427,[240]6.3325,[241]6.3351,[242]6.3398,[243]6.3379,[244]6.3262,[245]6.3234,[246]6.3115,[247]6.2986,[248]6.2905,[249]6.2884,[250]6.2923,[251]6.2847,[252]6.2810,[253]6.2710,[254]6.2678,[255]6.2560,[256]6.2375,[257]6.2264,[258]6.2180,[259]6.2164,[260]6.2086,[261]6.2047,[262]6.1992,[263]6.1941,[264]6.1740,[265]6.1732,[266]6.1717,[267]6.1647,[268]6.1741,[269]6.1725,[270]6.1737,[271]6.1813,[272]6.1846,[273]6.1846,[274]6.1872,[275]6.1954,[276]6.2018,[277]6.2176,[278]6.2286,[279]6.2377,[280]6.2405,[281]6.2492,[282]6.2552,[283]6.2698,[284]6.2778,[285]6.2865,[286]6.2998,[287]6.2998,[288]6.3060,[289]6.2968,[290]6.2811,[291]6.2659,[292]6.2504,[293]6.2364,[294]6.2382,[295]6.2378,[296]6.2422,[297]6.2409,[298]6.2435,[299]6.2406,[300]6.2293,[301]6.2299,[302]6.2222,[303]6.2143,[304]6.2067,[305]6.2033,[306]6.1907,[307]6.1926,[308]6.1959,[309]6.1797,[310]6.1739,[311]6.1675,[312]6.1703,[313]6.1647,[314]6.1629,[315]6.1461,[316]6.1411,[317]6.1245,[318]6.1029,[319]6.1152,[320]6.1280,[321]6.1323,[322]6.1277,[323]6.1209,[324]6.1181,[325]6.1284,[326]6.1284,[327]6.1304,[328]6.1346,[329]6.1407,[330]6.1440,[331]6.1566,[332]6.1538,[333]6.1606,[334]6.1552,[335]6.1485,[336]6.1515,[337]6.1489,[338]6.1485,[339]6.1432,[340]6.1385,[341]6.1465,[342]6.1490,[343]6.1545,[344]6.1546,[345]6.1542,[346]6.1514,[347]6.1566,[348]6.1604,[349]6.1624,[350]6.1593,[351]6.1600,[352]6.1608,[353]6.1548,[354]6.1552,[355]6.1601,[356]6.1627,[357]6.1589,[358]6.1681,[359]6.1709,[360]6.1668,[361]6.1667,[362]6.1735,[363]6.1848,[364]6.1911,[365]6.1970,[366]6.1979,[367]6.2071,[368]6.2044,[369]6.2045,[370]6.2057,[371]6.1998,[372]6.2048,[373]6.2105,[374]6.2090,[375]6.2087,[376]6.2157,[377]6.2109,[378]6.2135,[379]6.2196,[380]6.2118,[381]6.2080,[382]6.2026,[383]6.2019,[384]6.2012,[385]6.2006,[386]6.2002,[387]6.1997,[388]6.1957,[389]6.1904,[390]6.1835,[391]6.1756,[392]6.1712,[393]6.1692,[394]6.1719,[395]6.1701,[396]6.1623,[397]6.1702,[398]6.1735,[399]6.1817,[400]6.1814,[401]6.1828,[402]6.1838,[403]6.1853,[404]6.1917,[405]6.1817,[406]6.1785,[407]6.1776,[408]6.1791,[409]6.1910,[410]6.2019,[411]6.2138,[412]6.2299,[413]6.2420,[414]6.2493,[415]6.2547,[416]6.2623,[417]6.2747,[418]6.2785,[419]6.2863,[420]6.2950,[421]6.3069,[422]6.3121,[423]6.3192,[424]6.3315,[425]6.3405,[426]6.3472,[427]6.3516,[428]6.3600,[429]6.3654,[430]6.3734,[431]6.3877,[432]6.3921,[433]6.3911,[434]6.3866,[435]6.3872,[436]6.3892,[437]6.3989,[438]6.4068,[439]6.4037,[440]6.4031,[441]6.3978,[442]6.3967,[443]6.3980,[444]6.3982,[445]6.3965,[446]6.3989,[447]6.4020,[448]6.4064,[449]6.4037,[450]6.4046,[451]6.4001,[452]6.3881,[453]6.3794,[454]6.3737,[455]6.3751,[456]6.3800,[457]6.3822,[458]6.3799,[459]6.3803,[460]6.3888,[461]6.3858,[462]6.3839,[463]6.3890,[464]6.3880,[465]6.3849,[466]6.3772,[467]6.3772,[468]6.3771,[469]6.3790,[470]6.3795,[471]6.3747,[472]6.3795,[473]6.3739,[474]6.3750,[475]6.3690,[476]6.3714,[477]6.3641,[478]6.3627,[479]6.3689,[480]6.3735,[481]6.3752,[482]6.3706,[483]6.3663,[484]6.3686,[485]6.3664,[486]6.3609,[487]6.3611,[488]6.3588,[489]6.3540,[490]6.3515,[491]6.3488,[492]6.3425,[493]6.3394,[494]6.3379,[495]6.3380,[496]6.3344,[497]6.3288,[498]6.3272,[499]6.3224,[500]6.3123,[501]6.3055,[502]6.3054,[503]6.3047,[504]6.2954,[505]6.2977,[506]6.2987,[507]6.2933,[508]6.2897,[509]6.2889,[510]6.2929,[511]6.2977,[512]6.3012,[513]6.3033,[514]6.3101,[515]6.3047,[516]6.3039,[517]6.3050,[518]6.3053,[519]6.3087,[520]6.3114,[521]6.3130,[522]6.3158,[523]6.3166,[524]6.3221,[525]6.3261,[526]6.3275,[527]6.3291,[528]6.3241,[529]6.3246,[530]6.3199,[531]6.3187,[532]6.3233,[533]6.3255,[534]6.3235,[535]6.3256,[536]6.3199,[537]6.3176,[538]6.3223,[539]6.3231,[540]6.3270,[541]6.3275,[542]6.3287,[543]6.3301,[544]6.3316,[545]6.3292,[546]6.3302,[547]6.3259,[548]6.3209,[549]6.3206,[550]6.3176,[551]6.3136,[552]6.3116,[553]6.3076,[554]6.3052,[555]6.3022,[556]6.3020,[557]6.3043,[558]6.3004,[559]6.2997,[560]6.2998,[561]6.2996,[562]6.2977,[563]6.2974,[564]6.3016,[565]6.3036,[566]6.3034,[567]6.3013,[568]6.3016,[569]6.3002,[570]6.3027,[571]6.3031,[572]6.3039,[573]6.3040,[574]6.3004,[575]6.3003,[576]6.3005,[577]6.2992,[578]6.2972,[579]6.2979,[580]6.2910,[581]6.2870,[582]6.2858,[583]6.2866,[584]6.2868,[585]6.2791,[586]6.2723,[587]6.2728,[588]6.2774,[589]6.2830,[590]6.2860,[591]6.2879,[592]6.2863,[593]6.2828,[594]6.2836,[595]6.2812,[596]6.2849,[597]6.2827,[598]6.2795,[599]6.2816,[600]6.2811,[601]6.2796,[602]6.2812,[603]6.2843,[604]6.2853,[605]6.2887,[606]6.2906,[607]6.2890,[608]6.2854,[609]6.2859,[610]6.2896,[611]6.2876,[612]6.2902,[613]6.2865,[614]6.2812,[615]6.2736,[616]6.2764,[617]6.2705,[618]6.2653,[619]6.2598,[620]6.2455,[621]6.2381,[622]6.2363,[623]6.2380,[624]6.2383,[625]6.2383,[626]6.2369,[627]6.2388,[628]6.2388,[629]6.2382,[630]6.2415,[631]6.2478,[632]6.2533,[633]6.2516,[634]6.2549,[635]6.2555,[636]6.2528,[637]6.2497,[638]6.2525,[639]6.2494,[640]6.2504,[641]6.2509,[642]6.2576,[643]6.2597,[644]6.2607,[645]6.2587,[646]6.2630,[647]6.2592,[648]6.2603,[649]6.2604,[650]6.2645,[651]6.2702,[652]6.2711,[653]6.2752,[654]6.2687,[655]6.2681,
llama_print_timings: load time = 3969.63 ms
llama_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: prompt eval time = 1011871.41 ms / 335360 tokens ( 3.02 ms per token)
llama_print_timings: eval time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: total time = 1041931.08 ms
Q4_0 with "Full Range"
main: seed = 1682177985
llama.cpp: loading model from ../models/7B/ggml-model-q4_0-rf.bin
llama_model_load_internal: format = ggjt v1 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 512
llama_model_load_internal: n_embd = 4096
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 32
llama_model_load_internal: n_layer = 32
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 2 (mostly Q4_0)
llama_model_load_internal: n_ff = 11008
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 4113739.11 KB
llama_model_load_internal: mem required = 5809.32 MB (+ 1026.00 MB per state)
....................................................................................................
llama_init_from_file: kv self size = 256.00 MB
system_info: n_threads = 12 / 32 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
perplexity : calculating perplexity over 655 chunks, batch_size=512
1.65 seconds per pass - ETA 18 minutes
[1]4.4634,[2]4.9470,[3]5.8110,[4]6.4587,[5]6.5722,[6]6.5029,[7]6.6829,[8]6.7980,[9]7.1438,[10]7.3895,[11]7.6123,[12]7.6306,[13]7.5695,[14]7.6575,[15]7.9079,[16]7.5038,[17]7.3808,[18]7.3277,[19]6.9639,[20]6.9577,[21]6.8623,[22]6.6916,[23]6.6608,[24]6.5638,[25]6.5639,[26]6.4027,[27]6.2189,[28]6.1196,[29]6.0227,[30]5.8657,[31]5.8342,[32]5.8548,[33]5.7930,[34]5.8244,[35]5.8457,[36]5.8834,[37]5.8876,[38]5.9011,[39]5.9347,[40]5.9983,[41]6.0113,[42]6.0517,[43]6.0119,[44]6.0652,[45]6.0705,[46]6.0438,[47]6.0682,[48]6.0405,[49]6.0465,[50]6.0019,[51]5.9957,[52]5.9854,[53]6.0305,[54]6.0126,[55]5.9873,[56]6.0180,[57]6.0399,[58]6.0604,[59]6.0791,[60]6.1256,[61]6.1173,[62]6.1750,[63]6.2103,[64]6.2237,[65]6.2714,[66]6.2793,[67]6.2954,[68]6.3110,[69]6.3363,[70]6.3685,[71]6.3876,[72]6.4199,[73]6.4823,[74]6.4875,[75]6.5028,[76]6.5134,[77]6.5257,[78]6.5106,[79]6.5406,[80]6.5317,[81]6.5396,[82]6.5430,[83]6.4901,[84]6.4738,[85]6.4610,[86]6.4386,[87]6.3747,[88]6.3465,[89]6.3273,[90]6.3124,[91]6.3355,[92]6.3303,[93]6.3310,[94]6.3278,[95]6.3582,[96]6.3552,[97]6.3515,[98]6.3430,[99]6.3283,[100]6.3267,[101]6.3524,[102]6.3463,[103]6.3671,[104]6.3741,[105]6.3735,[106]6.3892,[107]6.3877,[108]6.3994,[109]6.3936,[110]6.3908,[111]6.4127,[112]6.4334,[113]6.4348,[114]6.4315,[115]6.4378,[116]6.4291,[117]6.4343,[118]6.4628,[119]6.4831,[120]6.5190,[121]6.5347,[122]6.5603,[123]6.5966,[124]6.6157,[125]6.6061,[126]6.6460,[127]6.6829,[128]6.7131,[129]6.6963,[130]6.7060,[131]6.7010,[132]6.6919,[133]6.6784,[134]6.6884,[135]6.6847,[136]6.6715,[137]6.6637,[138]6.6471,[139]6.6365,[140]6.6324,[141]6.6030,[142]6.5997,[143]6.5710,[144]6.5506,[145]6.5428,[146]6.5296,[147]6.5363,[148]6.5367,[149]6.5309,[150]6.5275,[151]6.5284,[152]6.5172,[153]6.5004,[154]6.4914,[155]6.4983,[156]6.4931,[157]6.5115,[158]6.5150,[159]6.5206,[160]6.5227,[161]6.5339,[162]6.5041,[163]6.4916,[164]6.4671,[165]6.4354,[166]6.4075,[167]6.3697,[168]6.3377,[169]6.3242,[170]6.3124,[171]6.2847,[172]6.2673,[173]6.2495,[174]6.2194,[175]6.1970,[176]6.1857,[177]6.1640,[178]6.1405,[179]6.1230,[180]6.1132,[181]6.0911,[182]6.0736,[183]6.0591,[184]6.0584,[185]6.0506,[186]6.0513,[187]6.0570,[188]6.0527,[189]6.0706,[190]6.0722,[191]6.0941,[192]6.1103,[193]6.1276,[194]6.1390,[195]6.1612,[196]6.1772,[197]6.1997,[198]6.2149,[199]6.2190,[200]6.2242,[201]6.2190,[202]6.2395,[203]6.2470,[204]6.2459,[205]6.2570,[206]6.2639,[207]6.2601,[208]6.2682,[209]6.2726,[210]6.2785,[211]6.2880,[212]6.2951,[213]6.3052,[214]6.3077,[215]6.3112,[216]6.3265,[217]6.3444,[218]6.3581,[219]6.3582,[220]6.3544,[221]6.3499,[222]6.3464,[223]6.3374,[224]6.3295,[225]6.3257,[226]6.3457,[227]6.3545,[228]6.3597,[229]6.3645,[230]6.3612,[231]6.3779,[232]6.3652,[233]6.3482,[234]6.3337,[235]6.3165,[236]6.3099,[237]6.2996,[238]6.3025,[239]6.2867,[240]6.2765,[241]6.2797,[242]6.2842,[243]6.2830,[244]6.2710,[245]6.2683,[246]6.2566,[247]6.2443,[248]6.2366,[249]6.2344,[250]6.2385,[251]6.2307,[252]6.2267,[253]6.2169,[254]6.2136,[255]6.2023,[256]6.1840,[257]6.1723,[258]6.1637,[259]6.1613,[260]6.1533,[261]6.1493,[262]6.1435,[263]6.1385,[264]6.1195,[265]6.1191,[266]6.1168,[267]6.1097,[268]6.1191,[269]6.1172,[270]6.1182,[271]6.1260,[272]6.1302,[273]6.1304,[274]6.1327,[275]6.1414,[276]6.1474,[277]6.1625,[278]6.1729,[279]6.1824,[280]6.1851,[281]6.1940,[282]6.2001,[283]6.2144,[284]6.2224,[285]6.2312,[286]6.2449,[287]6.2444,[288]6.2504,[289]6.2414,[290]6.2256,[291]6.2103,[292]6.1951,[293]6.1811,[294]6.1828,[295]6.1827,[296]6.1872,[297]6.1862,[298]6.1891,[299]6.1861,[300]6.1749,[301]6.1754,[302]6.1675,[303]6.1598,[304]6.1525,[305]6.1493,[306]6.1372,[307]6.1394,[308]6.1425,[309]6.1262,[310]6.1203,[311]6.1140,[312]6.1167,[313]6.1110,[314]6.1089,[315]6.0926,[316]6.0878,[317]6.0717,[318]6.0506,[319]6.0628,[320]6.0753,[321]6.0796,[322]6.0751,[323]6.0679,[324]6.0651,[325]6.0752,[326]6.0750,[327]6.0774,[328]6.0814,[329]6.0874,[330]6.0903,[331]6.1026,[332]6.0996,[333]6.1071,[334]6.1013,[335]6.0949,[336]6.0982,[337]6.0955,[338]6.0950,[339]6.0895,[340]6.0854,[341]6.0931,[342]6.0956,[343]6.1010,[344]6.1008,[345]6.1010,[346]6.0982,[347]6.1029,[348]6.1068,[349]6.1091,[350]6.1059,[351]6.1065,[352]6.1069,[353]6.1010,[354]6.1012,[355]6.1060,[356]6.1085,[357]6.1050,[358]6.1139,[359]6.1172,[360]6.1131,[361]6.1129,[362]6.1199,[363]6.1314,[364]6.1378,[365]6.1433,[366]6.1446,[367]6.1537,[368]6.1516,[369]6.1520,[370]6.1534,[371]6.1477,[372]6.1523,[373]6.1577,[374]6.1565,[375]6.1564,[376]6.1639,[377]6.1590,[378]6.1620,[379]6.1680,[380]6.1599,[381]6.1561,[382]6.1509,[383]6.1499,[384]6.1493,[385]6.1483,[386]6.1480,[387]6.1477,[388]6.1436,[389]6.1383,[390]6.1312,[391]6.1237,[392]6.1200,[393]6.1181,[394]6.1208,[395]6.1191,[396]6.1118,[397]6.1195,[398]6.1230,[399]6.1309,[400]6.1304,[401]6.1321,[402]6.1328,[403]6.1345,[404]6.1409,[405]6.1306,[406]6.1269,[407]6.1261,[408]6.1277,[409]6.1397,[410]6.1507,[411]6.1629,[412]6.1787,[413]6.1903,[414]6.1977,[415]6.2031,[416]6.2106,[417]6.2225,[418]6.2259,[419]6.2329,[420]6.2411,[421]6.2527,[422]6.2576,[423]6.2649,[424]6.2764,[425]6.2850,[426]6.2916,[427]6.2959,[428]6.3042,[429]6.3090,[430]6.3177,[431]6.3316,[432]6.3355,[433]6.3346,[434]6.3300,[435]6.3307,[436]6.3327,[437]6.3426,[438]6.3501,[439]6.3470,[440]6.3464,[441]6.3413,[442]6.3400,[443]6.3409,[444]6.3409,[445]6.3390,[446]6.3416,[447]6.3444,[448]6.3485,[449]6.3458,[450]6.3467,[451]6.3425,[452]6.3308,[453]6.3220,[454]6.3162,[455]6.3172,[456]6.3217,[457]6.3237,[458]6.3211,[459]6.3215,[460]6.3300,[461]6.3269,[462]6.3255,[463]6.3297,[464]6.3287,[465]6.3258,[466]6.3180,[467]6.3182,[468]6.3181,[469]6.3199,[470]6.3208,[471]6.3161,[472]6.3203,[473]6.3148,[474]6.3160,[475]6.3103,[476]6.3128,[477]6.3055,[478]6.3042,[479]6.3107,[480]6.3153,[481]6.3170,[482]6.3124,[483]6.3084,[484]6.3108,[485]6.3091,[486]6.3034,[487]6.3033,[488]6.3010,[489]6.2961,[490]6.2937,[491]6.2907,[492]6.2844,[493]6.2812,[494]6.2794,[495]6.2797,[496]6.2760,[497]6.2702,[498]6.2683,[499]6.2638,[500]6.2542,[501]6.2474,[502]6.2477,[503]6.2469,[504]6.2380,[505]6.2404,[506]6.2413,[507]6.2361,[508]6.2320,[509]6.2311,[510]6.2349,[511]6.2396,[512]6.2432,[513]6.2453,[514]6.2517,[515]6.2463,[516]6.2456,[517]6.2469,[518]6.2470,[519]6.2501,[520]6.2522,[521]6.2539,[522]6.2567,[523]6.2575,[524]6.2630,[525]6.2666,[526]6.2682,[527]6.2699,[528]6.2649,[529]6.2655,[530]6.2607,[531]6.2596,[532]6.2643,[533]6.2666,[534]6.2651,[535]6.2673,[536]6.2618,[537]6.2592,[538]6.2640,[539]6.2649,[540]6.2684,[541]6.2691,[542]6.2702,[543]6.2716,[544]6.2729,[545]6.2707,[546]6.2713,[547]6.2669,[548]6.2616,[549]6.2617,[550]6.2588,[551]6.2549,[552]6.2525,[553]6.2486,[554]6.2463,[555]6.2435,[556]6.2433,[557]6.2456,[558]6.2414,[559]6.2406,[560]6.2405,[561]6.2404,[562]6.2381,[563]6.2382,[564]6.2425,[565]6.2444,[566]6.2439,[567]6.2420,[568]6.2424,[569]6.2409,[570]6.2435,[571]6.2438,[572]6.2446,[573]6.2446,[574]6.2408,[575]6.2405,[576]6.2406,[577]6.2391,[578]6.2374,[579]6.2380,[580]6.2312,[581]6.2273,[582]6.2259,[583]6.2267,[584]6.2269,[585]6.2195,[586]6.2127,[587]6.2133,[588]6.2179,[589]6.2233,[590]6.2263,[591]6.2285,[592]6.2269,[593]6.2236,[594]6.2247,[595]6.2224,[596]6.2261,[597]6.2237,[598]6.2209,[599]6.2230,[600]6.2227,[601]6.2213,[602]6.2226,[603]6.2257,[604]6.2268,[605]6.2301,[606]6.2320,[607]6.2305,[608]6.2269,[609]6.2274,[610]6.2310,[611]6.2290,[612]6.2315,[613]6.2278,[614]6.2226,[615]6.2152,[616]6.2179,[617]6.2117,[618]6.2067,[619]6.2011,[620]6.1870,[621]6.1800,[622]6.1783,[623]6.1801,[624]6.1808,[625]6.1808,[626]6.1797,[627]6.1815,[628]6.1818,[629]6.1813,[630]6.1843,[631]6.1906,[632]6.1959,[633]6.1944,[634]6.1977,[635]6.1985,[636]6.1957,[637]6.1923,[638]6.1949,[639]6.1920,[640]6.1932,[641]6.1936,[642]6.2003,[643]6.2024,[644]6.2034,[645]6.2015,[646]6.2057,[647]6.2022,[648]6.2030,[649]6.2031,[650]6.2070,[651]6.2125,[652]6.2136,[653]6.2175,[654]6.2109,[655]6.2103,
llama_print_timings: load time = 3607.95 ms
llama_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: prompt eval time = 1017333.52 ms / 335360 tokens ( 3.03 ms per token)
llama_print_timings: eval time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: total time = 1046946.49 ms
Q4_2 with "RMSE optimized"
main: seed = 1682181163
llama.cpp: loading model from ../models/7B/ggml-model-q4_2-rmse.bin
llama_model_load_internal: format = ggjt v1 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 512
llama_model_load_internal: n_embd = 4096
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 32
llama_model_load_internal: n_layer = 32
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 5 (mostly Q4_2)
llama_model_load_internal: n_ff = 11008
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 4113739.11 KB
llama_model_load_internal: mem required = 5809.32 MB (+ 1026.00 MB per state)
....................................................................................................
llama_init_from_file: kv self size = 256.00 MB
system_info: n_threads = 12 / 32 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
perplexity : calculating perplexity over 655 chunks, batch_size=512
1.73 seconds per pass - ETA 18 minutes
[1]4.4440,[2]4.8798,[3]5.7711,[4]6.3974,[5]6.5195,[6]6.4922,[7]6.6837,[8]6.7940,[9]7.1411,[10]7.3810,[11]7.5962,[12]7.6175,[13]7.5385,[14]7.6152,[15]7.8719,[16]7.4820,[17]7.3567,[18]7.3198,[19]6.9568,[20]6.9468,[21]6.8518,[22]6.6764,[23]6.6391,[24]6.5449,[25]6.5465,[26]6.3794,[27]6.1963,[28]6.0955,[29]6.0054,[30]5.8462,[31]5.8142,[32]5.8357,[33]5.7771,[34]5.8140,[35]5.8391,[36]5.8819,[37]5.8855,[38]5.9007,[39]5.9364,[40]5.9958,[41]6.0085,[42]6.0489,[43]6.0065,[44]6.0616,[45]6.0645,[46]6.0405,[47]6.0619,[48]6.0339,[49]6.0370,[50]5.9954,[51]5.9909,[52]5.9802,[53]6.0233,[54]6.0059,[55]5.9810,[56]6.0081,[57]6.0276,[58]6.0485,[59]6.0660,[60]6.1104,[61]6.1017,[62]6.1599,[63]6.1959,[64]6.2119,[65]6.2584,[66]6.2658,[67]6.2843,[68]6.3018,[69]6.3286,[70]6.3612,[71]6.3834,[72]6.4137,[73]6.4759,[74]6.4816,[75]6.4960,[76]6.5086,[77]6.5208,[78]6.5059,[79]6.5336,[80]6.5246,[81]6.5345,[82]6.5384,[83]6.4837,[84]6.4661,[85]6.4552,[86]6.4324,[87]6.3674,[88]6.3388,[89]6.3190,[90]6.3046,[91]6.3282,[92]6.3221,[93]6.3243,[94]6.3211,[95]6.3489,[96]6.3471,[97]6.3419,[98]6.3347,[99]6.3197,[100]6.3206,[101]6.3464,[102]6.3409,[103]6.3610,[104]6.3685,[105]6.3677,[106]6.3827,[107]6.3800,[108]6.3936,[109]6.3869,[110]6.3828,[111]6.4050,[112]6.4254,[113]6.4279,[114]6.4249,[115]6.4321,[116]6.4235,[117]6.4294,[118]6.4588,[119]6.4792,[120]6.5146,[121]6.5322,[122]6.5577,[123]6.5955,[124]6.6141,[125]6.6037,[126]6.6434,[127]6.6802,[128]6.7105,[129]6.6942,[130]6.7041,[131]6.7003,[132]6.6918,[133]6.6788,[134]6.6894,[135]6.6853,[136]6.6726,[137]6.6643,[138]6.6490,[139]6.6385,[140]6.6334,[141]6.6033,[142]6.5996,[143]6.5698,[144]6.5491,[145]6.5400,[146]6.5267,[147]6.5332,[148]6.5333,[149]6.5271,[150]6.5220,[151]6.5234,[152]6.5115,[153]6.4947,[154]6.4860,[155]6.4928,[156]6.4876,[157]6.5056,[158]6.5087,[159]6.5141,[160]6.5160,[161]6.5288,[162]6.4987,[163]6.4855,[164]6.4607,[165]6.4289,[166]6.4010,[167]6.3628,[168]6.3309,[169]6.3178,[170]6.3064,[171]6.2785,[172]6.2614,[173]6.2442,[174]6.2140,[175]6.1919,[176]6.1819,[177]6.1614,[178]6.1377,[179]6.1208,[180]6.1120,[181]6.0904,[182]6.0726,[183]6.0587,[184]6.0583,[185]6.0510,[186]6.0519,[187]6.0579,[188]6.0534,[189]6.0706,[190]6.0719,[191]6.0937,[192]6.1099,[193]6.1273,[194]6.1386,[195]6.1597,[196]6.1757,[197]6.1967,[198]6.2118,[199]6.2158,[200]6.2204,[201]6.2158,[202]6.2365,[203]6.2446,[204]6.2435,[205]6.2541,[206]6.2608,[207]6.2572,[208]6.2656,[209]6.2698,[210]6.2754,[211]6.2851,[212]6.2925,[213]6.3030,[214]6.3053,[215]6.3091,[216]6.3244,[217]6.3425,[218]6.3557,[219]6.3563,[220]6.3518,[221]6.3470,[222]6.3443,[223]6.3336,[224]6.3263,[225]6.3224,[226]6.3434,[227]6.3521,[228]6.3568,[229]6.3628,[230]6.3592,[231]6.3762,[232]6.3632,[233]6.3463,[234]6.3313,[235]6.3143,[236]6.3070,[237]6.2967,[238]6.2997,[239]6.2840,[240]6.2739,[241]6.2765,[242]6.2802,[243]6.2782,[244]6.2666,[245]6.2637,[246]6.2519,[247]6.2394,[248]6.2319,[249]6.2299,[250]6.2338,[251]6.2269,[252]6.2233,[253]6.2134,[254]6.2092,[255]6.1981,[256]6.1801,[257]6.1681,[258]6.1595,[259]6.1577,[260]6.1503,[261]6.1463,[262]6.1409,[263]6.1356,[264]6.1163,[265]6.1153,[266]6.1139,[267]6.1074,[268]6.1167,[269]6.1148,[270]6.1158,[271]6.1237,[272]6.1267,[273]6.1266,[274]6.1285,[275]6.1363,[276]6.1420,[277]6.1576,[278]6.1678,[279]6.1765,[280]6.1794,[281]6.1886,[282]6.1948,[283]6.2096,[284]6.2172,[285]6.2260,[286]6.2396,[287]6.2395,[288]6.2452,[289]6.2363,[290]6.2205,[291]6.2051,[292]6.1898,[293]6.1759,[294]6.1779,[295]6.1775,[296]6.1815,[297]6.1798,[298]6.1826,[299]6.1796,[300]6.1682,[301]6.1685,[302]6.1608,[303]6.1527,[304]6.1446,[305]6.1421,[306]6.1294,[307]6.1317,[308]6.1351,[309]6.1193,[310]6.1132,[311]6.1070,[312]6.1099,[313]6.1042,[314]6.1026,[315]6.0863,[316]6.0813,[317]6.0650,[318]6.0438,[319]6.0560,[320]6.0686,[321]6.0729,[322]6.0686,[323]6.0617,[324]6.0592,[325]6.0694,[326]6.0692,[327]6.0709,[328]6.0746,[329]6.0808,[330]6.0833,[331]6.0958,[332]6.0928,[333]6.0999,[334]6.0941,[335]6.0872,[336]6.0905,[337]6.0878,[338]6.0876,[339]6.0821,[340]6.0778,[341]6.0856,[342]6.0879,[343]6.0927,[344]6.0924,[345]6.0923,[346]6.0893,[347]6.0937,[348]6.0969,[349]6.0989,[350]6.0953,[351]6.0959,[352]6.0960,[353]6.0900,[354]6.0905,[355]6.0958,[356]6.0986,[357]6.0953,[358]6.1045,[359]6.1074,[360]6.1037,[361]6.1033,[362]6.1101,[363]6.1217,[364]6.1281,[365]6.1337,[366]6.1349,[367]6.1439,[368]6.1414,[369]6.1418,[370]6.1432,[371]6.1373,[372]6.1423,[373]6.1476,[374]6.1461,[375]6.1459,[376]6.1531,[377]6.1482,[378]6.1508,[379]6.1566,[380]6.1485,[381]6.1447,[382]6.1393,[383]6.1384,[384]6.1378,[385]6.1372,[386]6.1369,[387]6.1363,[388]6.1322,[389]6.1271,[390]6.1201,[391]6.1124,[392]6.1082,[393]6.1067,[394]6.1092,[395]6.1077,[396]6.1001,[397]6.1079,[398]6.1120,[399]6.1202,[400]6.1200,[401]6.1216,[402]6.1224,[403]6.1246,[404]6.1311,[405]6.1210,[406]6.1176,[407]6.1170,[408]6.1182,[409]6.1302,[410]6.1409,[411]6.1523,[412]6.1683,[413]6.1801,[414]6.1877,[415]6.1927,[416]6.2005,[417]6.2129,[418]6.2168,[419]6.2240,[420]6.2328,[421]6.2444,[422]6.2494,[423]6.2563,[424]6.2679,[425]6.2768,[426]6.2836,[427]6.2880,[428]6.2964,[429]6.3015,[430]6.3099,[431]6.3242,[432]6.3283,[433]6.3272,[434]6.3227,[435]6.3235,[436]6.3256,[437]6.3353,[438]6.3428,[439]6.3395,[440]6.3391,[441]6.3340,[442]6.3326,[443]6.3339,[444]6.3342,[445]6.3323,[446]6.3350,[447]6.3380,[448]6.3426,[449]6.3401,[450]6.3413,[451]6.3371,[452]6.3242,[453]6.3154,[454]6.3097,[455]6.3109,[456]6.3155,[457]6.3175,[458]6.3154,[459]6.3157,[460]6.3242,[461]6.3212,[462]6.3195,[463]6.3243,[464]6.3234,[465]6.3202,[466]6.3123,[467]6.3120,[468]6.3118,[469]6.3137,[470]6.3140,[471]6.3091,[472]6.3140,[473]6.3085,[474]6.3093,[475]6.3030,[476]6.3050,[477]6.2978,[478]6.2965,[479]6.3025,[480]6.3071,[481]6.3088,[482]6.3043,[483]6.3001,[484]6.3024,[485]6.3009,[486]6.2953,[487]6.2954,[488]6.2931,[489]6.2883,[490]6.2860,[491]6.2831,[492]6.2771,[493]6.2741,[494]6.2725,[495]6.2729,[496]6.2693,[497]6.2637,[498]6.2618,[499]6.2571,[500]6.2474,[501]6.2407,[502]6.2409,[503]6.2403,[504]6.2314,[505]6.2340,[506]6.2350,[507]6.2294,[508]6.2254,[509]6.2246,[510]6.2283,[511]6.2332,[512]6.2365,[513]6.2385,[514]6.2448,[515]6.2393,[516]6.2383,[517]6.2392,[518]6.2392,[519]6.2423,[520]6.2448,[521]6.2464,[522]6.2495,[523]6.2503,[524]6.2558,[525]6.2594,[526]6.2606,[527]6.2624,[528]6.2574,[529]6.2577,[530]6.2529,[531]6.2519,[532]6.2568,[533]6.2590,[534]6.2574,[535]6.2598,[536]6.2543,[537]6.2519,[538]6.2565,[539]6.2576,[540]6.2614,[541]6.2616,[542]6.2628,[543]6.2641,[544]6.2653,[545]6.2629,[546]6.2637,[547]6.2593,[548]6.2544,[549]6.2541,[550]6.2511,[551]6.2475,[552]6.2454,[553]6.2414,[554]6.2390,[555]6.2361,[556]6.2357,[557]6.2379,[558]6.2342,[559]6.2335,[560]6.2334,[561]6.2333,[562]6.2312,[563]6.2312,[564]6.2356,[565]6.2377,[566]6.2374,[567]6.2353,[568]6.2358,[569]6.2342,[570]6.2368,[571]6.2374,[572]6.2383,[573]6.2385,[574]6.2353,[575]6.2347,[576]6.2346,[577]6.2333,[578]6.2312,[579]6.2318,[580]6.2249,[581]6.2211,[582]6.2199,[583]6.2208,[584]6.2210,[585]6.2136,[586]6.2069,[587]6.2072,[588]6.2121,[589]6.2175,[590]6.2203,[591]6.2224,[592]6.2210,[593]6.2175,[594]6.2183,[595]6.2161,[596]6.2195,[597]6.2174,[598]6.2143,[599]6.2163,[600]6.2158,[601]6.2143,[602]6.2159,[603]6.2191,[604]6.2200,[605]6.2235,[606]6.2254,[607]6.2237,[608]6.2204,[609]6.2211,[610]6.2246,[611]6.2227,[612]6.2254,[613]6.2216,[614]6.2164,[615]6.2090,[616]6.2118,[617]6.2056,[618]6.2005,[619]6.1948,[620]6.1807,[621]6.1735,[622]6.1719,[623]6.1735,[624]6.1740,[625]6.1742,[626]6.1729,[627]6.1749,[628]6.1750,[629]6.1744,[630]6.1775,[631]6.1833,[632]6.1889,[633]6.1872,[634]6.1905,[635]6.1913,[636]6.1882,[637]6.1849,[638]6.1876,[639]6.1847,[640]6.1856,[641]6.1859,[642]6.1924,[643]6.1945,[644]6.1956,[645]6.1937,[646]6.1978,[647]6.1939,[648]6.1950,[649]6.1951,[650]6.1992,[651]6.2048,[652]6.2057,[653]6.2096,[654]6.2033,[655]6.2027,
llama_print_timings: load time = 3848.02 ms
llama_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: prompt eval time = 1049269.56 ms / 335360 tokens ( 3.13 ms per token)
llama_print_timings: eval time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: total time = 1077683.81 ms
Q4_2 with "Full Range"
main: seed = 1682180074
llama.cpp: loading model from ../models/7B/ggml-model-q4_2-rf.bin
llama_model_load_internal: format = ggjt v1 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 512
llama_model_load_internal: n_embd = 4096
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 32
llama_model_load_internal: n_layer = 32
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 5 (mostly Q4_2)
llama_model_load_internal: n_ff = 11008
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 4113739.11 KB
llama_model_load_internal: mem required = 5809.32 MB (+ 1026.00 MB per state)
....................................................................................................
llama_init_from_file: kv self size = 256.00 MB
system_info: n_threads = 12 / 32 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
perplexity : calculating perplexity over 655 chunks, batch_size=512
1.76 seconds per pass - ETA 19 minutes
[1]4.4749,[2]4.9086,[3]5.8151,[4]6.4201,[5]6.5312,[6]6.4852,[7]6.6751,[8]6.7806,[9]7.1247,[10]7.3582,[11]7.5762,[12]7.5943,[13]7.5365,[14]7.6084,[15]7.8596,[16]7.4638,[17]7.3435,[18]7.2949,[19]6.9279,[20]6.9139,[21]6.8255,[22]6.6517,[23]6.6192,[24]6.5227,[25]6.5208,[26]6.3603,[27]6.1824,[28]6.0854,[29]5.9912,[30]5.8311,[31]5.8032,[32]5.8244,[33]5.7619,[34]5.7963,[35]5.8206,[36]5.8613,[37]5.8662,[38]5.8812,[39]5.9143,[40]5.9735,[41]5.9842,[42]6.0253,[43]5.9850,[44]6.0408,[45]6.0419,[46]6.0143,[47]6.0358,[48]6.0083,[49]6.0115,[50]5.9696,[51]5.9628,[52]5.9539,[53]5.9977,[54]5.9808,[55]5.9551,[56]5.9864,[57]6.0079,[58]6.0283,[59]6.0473,[60]6.0934,[61]6.0852,[62]6.1432,[63]6.1787,[64]6.1931,[65]6.2400,[66]6.2456,[67]6.2624,[68]6.2809,[69]6.3056,[70]6.3372,[71]6.3564,[72]6.3872,[73]6.4511,[74]6.4571,[75]6.4709,[76]6.4817,[77]6.4933,[78]6.4788,[79]6.5080,[80]6.4997,[81]6.5092,[82]6.5136,[83]6.4609,[84]6.4436,[85]6.4315,[86]6.4075,[87]6.3438,[88]6.3151,[89]6.2948,[90]6.2796,[91]6.3024,[92]6.2957,[93]6.2980,[94]6.2948,[95]6.3237,[96]6.3209,[97]6.3157,[98]6.3083,[99]6.2944,[100]6.2935,[101]6.3191,[102]6.3134,[103]6.3339,[104]6.3412,[105]6.3403,[106]6.3551,[107]6.3528,[108]6.3659,[109]6.3597,[110]6.3557,[111]6.3777,[112]6.3979,[113]6.3994,[114]6.3952,[115]6.4014,[116]6.3925,[117]6.3976,[118]6.4261,[119]6.4463,[120]6.4821,[121]6.4981,[122]6.5242,[123]6.5608,[124]6.5788,[125]6.5687,[126]6.6077,[127]6.6434,[128]6.6736,[129]6.6568,[130]6.6658,[131]6.6616,[132]6.6529,[133]6.6393,[134]6.6498,[135]6.6457,[136]6.6327,[137]6.6247,[138]6.6090,[139]6.5990,[140]6.5941,[141]6.5649,[142]6.5612,[143]6.5320,[144]6.5122,[145]6.5038,[146]6.4914,[147]6.4989,[148]6.4991,[149]6.4936,[150]6.4894,[151]6.4902,[152]6.4794,[153]6.4622,[154]6.4534,[155]6.4607,[156]6.4554,[157]6.4734,[158]6.4773,[159]6.4825,[160]6.4839,[161]6.4960,[162]6.4661,[163]6.4529,[164]6.4285,[165]6.3972,[166]6.3696,[167]6.3321,[168]6.3001,[169]6.2867,[170]6.2746,[171]6.2470,[172]6.2293,[173]6.2129,[174]6.1826,[175]6.1605,[176]6.1500,[177]6.1290,[178]6.1054,[179]6.0884,[180]6.0792,[181]6.0579,[182]6.0402,[183]6.0262,[184]6.0254,[185]6.0175,[186]6.0181,[187]6.0243,[188]6.0194,[189]6.0371,[190]6.0383,[191]6.0599,[192]6.0760,[193]6.0935,[194]6.1050,[195]6.1269,[196]6.1430,[197]6.1644,[198]6.1794,[199]6.1833,[200]6.1881,[201]6.1832,[202]6.2034,[203]6.2113,[204]6.2103,[205]6.2209,[206]6.2273,[207]6.2237,[208]6.2319,[209]6.2361,[210]6.2415,[211]6.2508,[212]6.2585,[213]6.2686,[214]6.2712,[215]6.2746,[216]6.2897,[217]6.3075,[218]6.3206,[219]6.3208,[220]6.3167,[221]6.3121,[222]6.3088,[223]6.2995,[224]6.2916,[225]6.2877,[226]6.3079,[227]6.3166,[228]6.3217,[229]6.3273,[230]6.3243,[231]6.3413,[232]6.3284,[233]6.3117,[234]6.2968,[235]6.2792,[236]6.2722,[237]6.2619,[238]6.2646,[239]6.2493,[240]6.2394,[241]6.2423,[242]6.2463,[243]6.2446,[244]6.2329,[245]6.2298,[246]6.2182,[247]6.2058,[248]6.1983,[249]6.1960,[250]6.2000,[251]6.1925,[252]6.1886,[253]6.1789,[254]6.1747,[255]6.1635,[256]6.1454,[257]6.1336,[258]6.1252,[259]6.1233,[260]6.1158,[261]6.1118,[262]6.1063,[263]6.1012,[264]6.0796,[265]6.0785,[266]6.0765,[267]6.0699,[268]6.0797,[269]6.0777,[270]6.0787,[271]6.0864,[272]6.0899,[273]6.0901,[274]6.0923,[275]6.1004,[276]6.1062,[277]6.1216,[278]6.1319,[279]6.1411,[280]6.1437,[281]6.1526,[282]6.1587,[283]6.1731,[284]6.1810,[285]6.1898,[286]6.2036,[287]6.2036,[288]6.2095,[289]6.2004,[290]6.1846,[291]6.1693,[292]6.1542,[293]6.1402,[294]6.1420,[295]6.1420,[296]6.1460,[297]6.1448,[298]6.1479,[299]6.1448,[300]6.1335,[301]6.1335,[302]6.1254,[303]6.1170,[304]6.1095,[305]6.1067,[306]6.0945,[307]6.0972,[308]6.1005,[309]6.0846,[310]6.0785,[311]6.0723,[312]6.0751,[313]6.0695,[314]6.0678,[315]6.0515,[316]6.0468,[317]6.0307,[318]6.0098,[319]6.0222,[320]6.0348,[321]6.0391,[322]6.0347,[323]6.0276,[324]6.0247,[325]6.0346,[326]6.0342,[327]6.0363,[328]6.0402,[329]6.0466,[330]6.0495,[331]6.0618,[332]6.0587,[333]6.0657,[334]6.0602,[335]6.0532,[336]6.0564,[337]6.0538,[338]6.0536,[339]6.0484,[340]6.0442,[341]6.0521,[342]6.0546,[343]6.0600,[344]6.0595,[345]6.0599,[346]6.0572,[347]6.0613,[348]6.0647,[349]6.0670,[350]6.0637,[351]6.0643,[352]6.0647,[353]6.0589,[354]6.0593,[355]6.0641,[356]6.0665,[357]6.0631,[358]6.0721,[359]6.0753,[360]6.0720,[361]6.0718,[362]6.0789,[363]6.0905,[364]6.0970,[365]6.1029,[366]6.1038,[367]6.1126,[368]6.1103,[369]6.1107,[370]6.1122,[371]6.1064,[372]6.1110,[373]6.1161,[374]6.1148,[375]6.1150,[376]6.1223,[377]6.1175,[378]6.1201,[379]6.1259,[380]6.1178,[381]6.1139,[382]6.1085,[383]6.1075,[384]6.1068,[385]6.1061,[386]6.1056,[387]6.1054,[388]6.1014,[389]6.0962,[390]6.0894,[391]6.0819,[392]6.0778,[393]6.0762,[394]6.0785,[395]6.0771,[396]6.0696,[397]6.0771,[398]6.0812,[399]6.0891,[400]6.0890,[401]6.0905,[402]6.0913,[403]6.0933,[404]6.0999,[405]6.0898,[406]6.0862,[407]6.0855,[408]6.0869,[409]6.0990,[410]6.1098,[411]6.1213,[412]6.1370,[413]6.1487,[414]6.1563,[415]6.1613,[416]6.1690,[417]6.1809,[418]6.1844,[419]6.1913,[420]6.1996,[421]6.2113,[422]6.2159,[423]6.2227,[424]6.2341,[425]6.2427,[426]6.2493,[427]6.2537,[428]6.2619,[429]6.2669,[430]6.2753,[431]6.2894,[432]6.2936,[433]6.2925,[434]6.2880,[435]6.2889,[436]6.2912,[437]6.3009,[438]6.3084,[439]6.3053,[440]6.3050,[441]6.3000,[442]6.2984,[443]6.2998,[444]6.2998,[445]6.2978,[446]6.3005,[447]6.3035,[448]6.3077,[449]6.3050,[450]6.3063,[451]6.3021,[452]6.2894,[453]6.2806,[454]6.2750,[455]6.2758,[456]6.2802,[457]6.2822,[458]6.2800,[459]6.2805,[460]6.2891,[461]6.2861,[462]6.2847,[463]6.2893,[464]6.2884,[465]6.2855,[466]6.2777,[467]6.2778,[468]6.2777,[469]6.2797,[470]6.2804,[471]6.2755,[472]6.2801,[473]6.2748,[474]6.2760,[475]6.2700,[476]6.2722,[477]6.2652,[478]6.2639,[479]6.2699,[480]6.2745,[481]6.2761,[482]6.2715,[483]6.2672,[484]6.2694,[485]6.2678,[486]6.2620,[487]6.2621,[488]6.2600,[489]6.2553,[490]6.2531,[491]6.2501,[492]6.2440,[493]6.2410,[494]6.2395,[495]6.2394,[496]6.2358,[497]6.2300,[498]6.2281,[499]6.2236,[500]6.2141,[501]6.2072,[502]6.2075,[503]6.2066,[504]6.1980,[505]6.2006,[506]6.2014,[507]6.1958,[508]6.1915,[509]6.1905,[510]6.1943,[511]6.1988,[512]6.2021,[513]6.2042,[514]6.2107,[515]6.2051,[516]6.2044,[517]6.2055,[518]6.2054,[519]6.2085,[520]6.2107,[521]6.2123,[522]6.2152,[523]6.2160,[524]6.2217,[525]6.2251,[526]6.2264,[527]6.2281,[528]6.2232,[529]6.2235,[530]6.2188,[531]6.2177,[532]6.2223,[533]6.2246,[534]6.2233,[535]6.2258,[536]6.2202,[537]6.2177,[538]6.2225,[539]6.2235,[540]6.2272,[541]6.2277,[542]6.2288,[543]6.2305,[544]6.2317,[545]6.2294,[546]6.2301,[547]6.2257,[548]6.2207,[549]6.2207,[550]6.2178,[551]6.2140,[552]6.2117,[553]6.2079,[554]6.2056,[555]6.2027,[556]6.2023,[557]6.2045,[558]6.2006,[559]6.1998,[560]6.1993,[561]6.1994,[562]6.1969,[563]6.1969,[564]6.2012,[565]6.2032,[566]6.2027,[567]6.2008,[568]6.2013,[569]6.1998,[570]6.2023,[571]6.2029,[572]6.2037,[573]6.2037,[574]6.2001,[575]6.1997,[576]6.1998,[577]6.1981,[578]6.1962,[579]6.1968,[580]6.1899,[581]6.1860,[582]6.1848,[583]6.1857,[584]6.1858,[585]6.1785,[586]6.1717,[587]6.1723,[588]6.1769,[589]6.1823,[590]6.1851,[591]6.1874,[592]6.1860,[593]6.1826,[594]6.1836,[595]6.1814,[596]6.1851,[597]6.1830,[598]6.1800,[599]6.1821,[600]6.1817,[601]6.1802,[602]6.1817,[603]6.1849,[604]6.1859,[605]6.1894,[606]6.1914,[607]6.1897,[608]6.1864,[609]6.1871,[610]6.1905,[611]6.1883,[612]6.1908,[613]6.1870,[614]6.1818,[615]6.1744,[616]6.1773,[617]6.1711,[618]6.1661,[619]6.1605,[620]6.1465,[621]6.1396,[622]6.1380,[623]6.1395,[624]6.1402,[625]6.1404,[626]6.1392,[627]6.1412,[628]6.1413,[629]6.1408,[630]6.1440,[631]6.1501,[632]6.1554,[633]6.1541,[634]6.1575,[635]6.1583,[636]6.1552,[637]6.1517,[638]6.1546,[639]6.1514,[640]6.1524,[641]6.1528,[642]6.1595,[643]6.1619,[644]6.1629,[645]6.1610,[646]6.1649,[647]6.1611,[648]6.1621,[649]6.1623,[650]6.1662,[651]6.1717,[652]6.1728,[653]6.1769,[654]6.1704,[655]6.1698,
llama_print_timings: load time = 4366.83 ms
llama_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: prompt eval time = 1057648.70 ms / 335360 tokens ( 3.15 ms per token)
llama_print_timings: eval time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: total time = 1088862.88 ms
Used branches:
- https://github.com/ggerganov/llama.cpp/tree/q4_0-q4_2-range-fix
- https://github.com/ggerganov/llama.cpp/tree/gg/rmse_quantization
Also, is there an extension of this approach to
Q4_1
andQ4_3
?
These already use the full range, min
maps to 0 and max
maps to 15.
Also, is there an extension of this approach to
Q4_1
andQ4_3
?These already use the full range,
min
maps to 0 andmax
maps to 15.
@unbounded @sw @ikawrakow
Based on the results in my previous comment, it seems that RMSE optimization is not optimal for perplexity - at least for Q4_0
and Q4_2
on 7B. These are fresh runs with cuBLAS using this branch and #1106 rebased on latest master
. I will run the same tests on 13B to see if this behaviour persists.
I think one other "feature" of this approach is that it uses the max
to do the scale. Maybe it's not "the utilization of the full range of quant values" but rather the fact that we are scaling with the max value instead of abs(max) (as has been suggested by @sw in other comments). Maybe the extension to Q4_1
and Q4_3
is the same - use max
to do the scaling and clamp the low values to 0. Not sure.. But these numbers indicate we should look more into this
Edit: Started the 4 13B runs - will report the results tomorrow
Results for 13B
Method | RMSE | Full Range |
---|---|---|
Q4_0 |
5.4083 |
5.3748 |
Q4_2 |
5.3468 |
5.3433 |
Q4_0 with "RMSE optimized"
main: seed = 1682197546
llama.cpp: loading model from ../models/13B/ggml-model-q4_0-rmse.bin
llama_model_load_internal: format = ggjt v1 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 512
llama_model_load_internal: n_embd = 5120
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 40
llama_model_load_internal: n_layer = 40
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 2 (mostly Q4_0)
llama_model_load_internal: n_ff = 13824
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size = 7945693.73 KB
llama_model_load_internal: mem required = 9807.47 MB (+ 1608.00 MB per state)
....................................................................................................
llama_init_from_file: kv self size = 400.00 MB
system_info: n_threads = 12 / 32 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
perplexity : calculating perplexity over 655 chunks, batch_size=512
2.76 seconds per pass - ETA 30 minutes
[1]3.8509,[2]4.3030,[3]5.1071,[4]5.5403,[5]5.7289,[6]5.6695,[7]5.8188,[8]5.9219,[9]6.1871,[10]6.4244,[11]6.6122,[12]6.6607,[13]6.6188,[14]6.7096,[15]6.9146,[16]6.5811,[17]6.4931,[18]6.4638,[19]6.1623,[20]6.1432,[21]6.0655,[22]5.8854,[23]5.8581,[24]5.7610,[25]5.7688,[26]5.6141,[27]5.4337,[28]5.3247,[29]5.2482,[30]5.1044,[31]5.0621,[32]5.0724,[33]5.0306,[34]5.0756,[35]5.0958,[36]5.1156,[37]5.1068,[38]5.1000,[39]5.1336,[40]5.1750,[41]5.1979,[42]5.2323,[43]5.1939,[44]5.2371,[45]5.2408,[46]5.2157,[47]5.2425,[48]5.2274,[49]5.2298,[50]5.1977,[51]5.2053,[52]5.1976,[53]5.2416,[54]5.2309,[55]5.2117,[56]5.2328,[57]5.2519,[58]5.2730,[59]5.2919,[60]5.3287,[61]5.3221,[62]5.3794,[63]5.4037,[64]5.4140,[65]5.4519,[66]5.4518,[67]5.4709,[68]5.4832,[69]5.5108,[70]5.5413,[71]5.5652,[72]5.6000,[73]5.6479,[74]5.6545,[75]5.6651,[76]5.6813,[77]5.6921,[78]5.6784,[79]5.7043,[80]5.6986,[81]5.7063,[82]5.7043,[83]5.6578,[84]5.6454,[85]5.6379,[86]5.6220,[87]5.5553,[88]5.5143,[89]5.4922,[90]5.4816,[91]5.5036,[92]5.4985,[93]5.4998,[94]5.4989,[95]5.5251,[96]5.5220,[97]5.5203,[98]5.5168,[99]5.5089,[100]5.5054,[101]5.5288,[102]5.5257,[103]5.5408,[104]5.5455,[105]5.5469,[106]5.5621,[107]5.5595,[108]5.5758,[109]5.5740,[110]5.5686,[111]5.5865,[112]5.6039,[113]5.6034,[114]5.6011,[115]5.6051,[116]5.5930,[117]5.5933,[118]5.6174,[119]5.6354,[120]5.6659,[121]5.6819,[122]5.7038,[123]5.7417,[124]5.7589,[125]5.7534,[126]5.7893,[127]5.8233,[128]5.8529,[129]5.8411,[130]5.8486,[131]5.8442,[132]5.8410,[133]5.8300,[134]5.8382,[135]5.8396,[136]5.8306,[137]5.8271,[138]5.8134,[139]5.8057,[140]5.8046,[141]5.7759,[142]5.7723,[143]5.7475,[144]5.7320,[145]5.7229,[146]5.7119,[147]5.7166,[148]5.7180,[149]5.7142,[150]5.7129,[151]5.7172,[152]5.7111,[153]5.7017,[154]5.6958,[155]5.7025,[156]5.7007,[157]5.7165,[158]5.7188,[159]5.7197,[160]5.7228,[161]5.7337,[162]5.7084,[163]5.6990,[164]5.6780,[165]5.6530,[166]5.6294,[167]5.5975,[168]5.5700,[169]5.5564,[170]5.5484,[171]5.5278,[172]5.5155,[173]5.5021,[174]5.4751,[175]5.4550,[176]5.4418,[177]5.4256,[178]5.4049,[179]5.3916,[180]5.3845,[181]5.3680,[182]5.3515,[183]5.3391,[184]5.3386,[185]5.3316,[186]5.3321,[187]5.3375,[188]5.3345,[189]5.3512,[190]5.3516,[191]5.3693,[192]5.3825,[193]5.3980,[194]5.4088,[195]5.4281,[196]5.4395,[197]5.4589,[198]5.4728,[199]5.4747,[200]5.4750,[201]5.4686,[202]5.4814,[203]5.4875,[204]5.4833,[205]5.4925,[206]5.4974,[207]5.4935,[208]5.4989,[209]5.5022,[210]5.5080,[211]5.5185,[212]5.5245,[213]5.5334,[214]5.5370,[215]5.5408,[216]5.5527,[217]5.5688,[218]5.5827,[219]5.5825,[220]5.5797,[221]5.5753,[222]5.5759,[223]5.5693,[224]5.5625,[225]5.5591,[226]5.5794,[227]5.5861,[228]5.5933,[229]5.6005,[230]5.5968,[231]5.6128,[232]5.6024,[233]5.5875,[234]5.5728,[235]5.5499,[236]5.5445,[237]5.5358,[238]5.5390,[239]5.5272,[240]5.5180,[241]5.5207,[242]5.5223,[243]5.5208,[244]5.5106,[245]5.5064,[246]5.4961,[247]5.4863,[248]5.4800,[249]5.4764,[250]5.4802,[251]5.4719,[252]5.4666,[253]5.4574,[254]5.4524,[255]5.4425,[256]5.4262,[257]5.4159,[258]5.4089,[259]5.4079,[260]5.3996,[261]5.3950,[262]5.3909,[263]5.3859,[264]5.3634,[265]5.3635,[266]5.3605,[267]5.3541,[268]5.3607,[269]5.3604,[270]5.3613,[271]5.3676,[272]5.3705,[273]5.3720,[274]5.3735,[275]5.3795,[276]5.3851,[277]5.3978,[278]5.4059,[279]5.4145,[280]5.4179,[281]5.4278,[282]5.4329,[283]5.4458,[284]5.4544,[285]5.4621,[286]5.4747,[287]5.4718,[288]5.4778,[289]5.4713,[290]5.4573,[291]5.4447,[292]5.4312,[293]5.4191,[294]5.4197,[295]5.4194,[296]5.4243,[297]5.4231,[298]5.4250,[299]5.4226,[300]5.4138,[301]5.4141,[302]5.4080,[303]5.3999,[304]5.3924,[305]5.3901,[306]5.3795,[307]5.3823,[308]5.3834,[309]5.3700,[310]5.3673,[311]5.3635,[312]5.3656,[313]5.3598,[314]5.3580,[315]5.3451,[316]5.3417,[317]5.3293,[318]5.3131,[319]5.3241,[320]5.3356,[321]5.3399,[322]5.3368,[323]5.3305,[324]5.3288,[325]5.3387,[326]5.3402,[327]5.3409,[328]5.3439,[329]5.3490,[330]5.3510,[331]5.3608,[332]5.3573,[333]5.3650,[334]5.3602,[335]5.3550,[336]5.3569,[337]5.3558,[338]5.3556,[339]5.3514,[340]5.3488,[341]5.3554,[342]5.3584,[343]5.3627,[344]5.3633,[345]5.3644,[346]5.3630,[347]5.3664,[348]5.3702,[349]5.3721,[350]5.3699,[351]5.3713,[352]5.3716,[353]5.3666,[354]5.3667,[355]5.3715,[356]5.3744,[357]5.3711,[358]5.3794,[359]5.3813,[360]5.3779,[361]5.3776,[362]5.3845,[363]5.3956,[364]5.4010,[365]5.4054,[366]5.4072,[367]5.4158,[368]5.4133,[369]5.4145,[370]5.4164,[371]5.4122,[372]5.4168,[373]5.4209,[374]5.4186,[375]5.4179,[376]5.4235,[377]5.4202,[378]5.4224,[379]5.4264,[380]5.4199,[381]5.4168,[382]5.4129,[383]5.4112,[384]5.4112,[385]5.4103,[386]5.4094,[387]5.4091,[388]5.4058,[389]5.4020,[390]5.3964,[391]5.3905,[392]5.3868,[393]5.3868,[394]5.3901,[395]5.3893,[396]5.3843,[397]5.3908,[398]5.3951,[399]5.4021,[400]5.4013,[401]5.4018,[402]5.4032,[403]5.4054,[404]5.4104,[405]5.3950,[406]5.3913,[407]5.3902,[408]5.3916,[409]5.4028,[410]5.4117,[411]5.4215,[412]5.4358,[413]5.4459,[414]5.4523,[415]5.4584,[416]5.4658,[417]5.4755,[418]5.4781,[419]5.4830,[420]5.4910,[421]5.5007,[422]5.5042,[423]5.5102,[424]5.5196,[425]5.5272,[426]5.5338,[427]5.5382,[428]5.5454,[429]5.5494,[430]5.5555,[431]5.5683,[432]5.5715,[433]5.5707,[434]5.5672,[435]5.5686,[436]5.5715,[437]5.5796,[438]5.5871,[439]5.5845,[440]5.5840,[441]5.5793,[442]5.5780,[443]5.5792,[444]5.5808,[445]5.5800,[446]5.5820,[447]5.5842,[448]5.5873,[449]5.5858,[450]5.5866,[451]5.5837,[452]5.5682,[453]5.5590,[454]5.5537,[455]5.5542,[456]5.5581,[457]5.5594,[458]5.5576,[459]5.5572,[460]5.5649,[461]5.5607,[462]5.5569,[463]5.5561,[464]5.5556,[465]5.5532,[466]5.5457,[467]5.5448,[468]5.5427,[469]5.5440,[470]5.5430,[471]5.5381,[472]5.5392,[473]5.5343,[474]5.5338,[475]5.5270,[476]5.5251,[477]5.5170,[478]5.5144,[479]5.5149,[480]5.5174,[481]5.5175,[482]5.5128,[483]5.5088,[484]5.5102,[485]5.5038,[486]5.4973,[487]5.4964,[488]5.4936,[489]5.4887,[490]5.4856,[491]5.4820,[492]5.4753,[493]5.4722,[494]5.4709,[495]5.4690,[496]5.4651,[497]5.4588,[498]5.4565,[499]5.4530,[500]5.4448,[501]5.4376,[502]5.4363,[503]5.4353,[504]5.4274,[505]5.4277,[506]5.4283,[507]5.4229,[508]5.4192,[509]5.4196,[510]5.4220,[511]5.4261,[512]5.4302,[513]5.4329,[514]5.4382,[515]5.4341,[516]5.4329,[517]5.4328,[518]5.4328,[519]5.4348,[520]5.4360,[521]5.4371,[522]5.4384,[523]5.4390,[524]5.4443,[525]5.4474,[526]5.4477,[527]5.4492,[528]5.4437,[529]5.4448,[530]5.4406,[531]5.4398,[532]5.4448,[533]5.4475,[534]5.4455,[535]5.4478,[536]5.4433,[537]5.4416,[538]5.4468,[539]5.4476,[540]5.4495,[541]5.4492,[542]5.4507,[543]5.4528,[544]5.4542,[545]5.4530,[546]5.4534,[547]5.4501,[548]5.4460,[549]5.4462,[550]5.4441,[551]5.4413,[552]5.4392,[553]5.4362,[554]5.4338,[555]5.4318,[556]5.4312,[557]5.4333,[558]5.4299,[559]5.4300,[560]5.4287,[561]5.4289,[562]5.4261,[563]5.4262,[564]5.4303,[565]5.4316,[566]5.4320,[567]5.4301,[568]5.4311,[569]5.4297,[570]5.4323,[571]5.4337,[572]5.4346,[573]5.4349,[574]5.4321,[575]5.4306,[576]5.4301,[577]5.4284,[578]5.4264,[579]5.4263,[580]5.4209,[581]5.4180,[582]5.4182,[583]5.4189,[584]5.4193,[585]5.4133,[586]5.4081,[587]5.4081,[588]5.4125,[589]5.4175,[590]5.4206,[591]5.4224,[592]5.4212,[593]5.4174,[594]5.4185,[595]5.4169,[596]5.4208,[597]5.4189,[598]5.4156,[599]5.4181,[600]5.4172,[601]5.4163,[602]5.4164,[603]5.4193,[604]5.4200,[605]5.4222,[606]5.4235,[607]5.4222,[608]5.4194,[609]5.4202,[610]5.4243,[611]5.4231,[612]5.4253,[613]5.4224,[614]5.4188,[615]5.4128,[616]5.4153,[617]5.4103,[618]5.4059,[619]5.4013,[620]5.3901,[621]5.3846,[622]5.3828,[623]5.3843,[624]5.3848,[625]5.3854,[626]5.3851,[627]5.3879,[628]5.3888,[629]5.3894,[630]5.3924,[631]5.3971,[632]5.4019,[633]5.4005,[634]5.4036,[635]5.4032,[636]5.4002,[637]5.3965,[638]5.3985,[639]5.3954,[640]5.3960,[641]5.3963,[642]5.4015,[643]5.4034,[644]5.4053,[645]5.4038,[646]5.4072,[647]5.4021,[648]5.4032,[649]5.4035,[650]5.4066,[651]5.4108,[652]5.4110,[653]5.4149,[654]5.4091,[655]5.4083,
llama_print_timings: load time = 6746.60 ms
llama_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: prompt eval time = 1716958.36 ms / 335360 tokens ( 5.12 ms per token)
llama_print_timings: eval time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: total time = 1748260.13 ms
Q4_0 with "Full Range"
main: seed = 1682195797
llama.cpp: loading model from ../models/13B/ggml-model-q4_0-rf.bin
llama_model_load_internal: format = ggjt v1 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 512
llama_model_load_internal: n_embd = 5120
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 40
llama_model_load_internal: n_layer = 40
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 2 (mostly Q4_0)
llama_model_load_internal: n_ff = 13824
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size = 7945693.73 KB
llama_model_load_internal: mem required = 9807.47 MB (+ 1608.00 MB per state)
....................................................................................................
llama_init_from_file: kv self size = 400.00 MB
system_info: n_threads = 12 / 32 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
perplexity : calculating perplexity over 655 chunks, batch_size=512
2.72 seconds per pass - ETA 29 minutes
[1]3.8168,[2]4.2896,[3]5.1193,[4]5.5227,[5]5.7019,[6]5.6465,[7]5.7810,[8]5.8927,[9]6.1565,[10]6.3840,[11]6.5709,[12]6.6243,[13]6.5791,[14]6.6660,[15]6.8723,[16]6.5417,[17]6.4575,[18]6.4328,[19]6.1342,[20]6.1057,[21]6.0311,[22]5.8525,[23]5.8259,[24]5.7301,[25]5.7434,[26]5.5895,[27]5.4122,[28]5.3098,[29]5.2347,[30]5.0910,[31]5.0511,[32]5.0650,[33]5.0215,[34]5.0639,[35]5.0836,[36]5.1057,[37]5.1009,[38]5.0972,[39]5.1267,[40]5.1685,[41]5.1919,[42]5.2244,[43]5.1870,[44]5.2287,[45]5.2318,[46]5.2029,[47]5.2314,[48]5.2160,[49]5.2191,[50]5.1870,[51]5.1944,[52]5.1853,[53]5.2298,[54]5.2194,[55]5.2013,[56]5.2221,[57]5.2412,[58]5.2637,[59]5.2815,[60]5.3179,[61]5.3114,[62]5.3660,[63]5.3897,[64]5.3998,[65]5.4358,[66]5.4368,[67]5.4566,[68]5.4682,[69]5.4937,[70]5.5228,[71]5.5449,[72]5.5803,[73]5.6275,[74]5.6359,[75]5.6467,[76]5.6627,[77]5.6733,[78]5.6598,[79]5.6868,[80]5.6806,[81]5.6894,[82]5.6861,[83]5.6402,[84]5.6290,[85]5.6222,[86]5.6058,[87]5.5391,[88]5.4956,[89]5.4747,[90]5.4641,[91]5.4860,[92]5.4824,[93]5.4830,[94]5.4811,[95]5.5078,[96]5.5054,[97]5.5023,[98]5.4999,[99]5.4906,[100]5.4868,[101]5.5105,[102]5.5058,[103]5.5213,[104]5.5257,[105]5.5269,[106]5.5415,[107]5.5393,[108]5.5553,[109]5.5543,[110]5.5488,[111]5.5680,[112]5.5849,[113]5.5836,[114]5.5815,[115]5.5864,[116]5.5743,[117]5.5739,[118]5.5973,[119]5.6155,[120]5.6436,[121]5.6593,[122]5.6812,[123]5.7182,[124]5.7357,[125]5.7304,[126]5.7663,[127]5.7995,[128]5.8285,[129]5.8166,[130]5.8244,[131]5.8196,[132]5.8166,[133]5.8048,[134]5.8126,[135]5.8133,[136]5.8037,[137]5.7999,[138]5.7864,[139]5.7787,[140]5.7775,[141]5.7501,[142]5.7457,[143]5.7206,[144]5.7053,[145]5.6958,[146]5.6844,[147]5.6897,[148]5.6917,[149]5.6879,[150]5.6871,[151]5.6915,[152]5.6853,[153]5.6751,[154]5.6693,[155]5.6761,[156]5.6735,[157]5.6899,[158]5.6922,[159]5.6926,[160]5.6956,[161]5.7068,[162]5.6808,[163]5.6715,[164]5.6512,[165]5.6259,[166]5.6031,[167]5.5709,[168]5.5430,[169]5.5290,[170]5.5209,[171]5.5000,[172]5.4875,[173]5.4750,[174]5.4484,[175]5.4283,[176]5.4149,[177]5.3983,[178]5.3779,[179]5.3646,[180]5.3579,[181]5.3414,[182]5.3251,[183]5.3130,[184]5.3120,[185]5.3046,[186]5.3054,[187]5.3108,[188]5.3082,[189]5.3246,[190]5.3252,[191]5.3424,[192]5.3559,[193]5.3711,[194]5.3821,[195]5.4015,[196]5.4130,[197]5.4319,[198]5.4455,[199]5.4473,[200]5.4484,[201]5.4422,[202]5.4557,[203]5.4614,[204]5.4571,[205]5.4659,[206]5.4707,[207]5.4666,[208]5.4726,[209]5.4760,[210]5.4814,[211]5.4920,[212]5.4983,[213]5.5075,[214]5.5104,[215]5.5140,[216]5.5262,[217]5.5425,[218]5.5563,[219]5.5554,[220]5.5522,[221]5.5472,[222]5.5474,[223]5.5407,[224]5.5336,[225]5.5303,[226]5.5503,[227]5.5556,[228]5.5629,[229]5.5700,[230]5.5660,[231]5.5816,[232]5.5713,[233]5.5568,[234]5.5421,[235]5.5199,[236]5.5142,[237]5.5052,[238]5.5084,[239]5.4972,[240]5.4882,[241]5.4908,[242]5.4924,[243]5.4909,[244]5.4809,[245]5.4774,[246]5.4673,[247]5.4573,[248]5.4510,[249]5.4478,[250]5.4514,[251]5.4436,[252]5.4385,[253]5.4292,[254]5.4248,[255]5.4153,[256]5.3990,[257]5.3890,[258]5.3823,[259]5.3815,[260]5.3735,[261]5.3690,[262]5.3652,[263]5.3604,[264]5.3372,[265]5.3373,[266]5.3347,[267]5.3280,[268]5.3346,[269]5.3342,[270]5.3349,[271]5.3413,[272]5.3440,[273]5.3450,[274]5.3466,[275]5.3527,[276]5.3586,[277]5.3710,[278]5.3793,[279]5.3875,[280]5.3911,[281]5.4011,[282]5.4062,[283]5.4189,[284]5.4276,[285]5.4352,[286]5.4477,[287]5.4441,[288]5.4497,[289]5.4431,[290]5.4294,[291]5.4165,[292]5.4032,[293]5.3913,[294]5.3921,[295]5.3920,[296]5.3970,[297]5.3959,[298]5.3982,[299]5.3959,[300]5.3870,[301]5.3872,[302]5.3810,[303]5.3727,[304]5.3655,[305]5.3628,[306]5.3518,[307]5.3546,[308]5.3557,[309]5.3423,[310]5.3393,[311]5.3352,[312]5.3370,[313]5.3315,[314]5.3296,[315]5.3167,[316]5.3135,[317]5.3007,[318]5.2844,[319]5.2946,[320]5.3061,[321]5.3101,[322]5.3070,[323]5.3012,[324]5.2994,[325]5.3089,[326]5.3104,[327]5.3108,[328]5.3142,[329]5.3193,[330]5.3212,[331]5.3314,[332]5.3279,[333]5.3357,[334]5.3307,[335]5.3255,[336]5.3274,[337]5.3266,[338]5.3263,[339]5.3221,[340]5.3195,[341]5.3263,[342]5.3292,[343]5.3335,[344]5.3339,[345]5.3352,[346]5.3340,[347]5.3375,[348]5.3415,[349]5.3433,[350]5.3414,[351]5.3427,[352]5.3429,[353]5.3379,[354]5.3382,[355]5.3430,[356]5.3461,[357]5.3428,[358]5.3510,[359]5.3530,[360]5.3493,[361]5.3488,[362]5.3555,[363]5.3665,[364]5.3718,[365]5.3760,[366]5.3778,[367]5.3865,[368]5.3842,[369]5.3854,[370]5.3876,[371]5.3836,[372]5.3881,[373]5.3922,[374]5.3902,[375]5.3894,[376]5.3951,[377]5.3916,[378]5.3940,[379]5.3979,[380]5.3913,[381]5.3884,[382]5.3840,[383]5.3819,[384]5.3818,[385]5.3806,[386]5.3793,[387]5.3791,[388]5.3760,[389]5.3722,[390]5.3668,[391]5.3610,[392]5.3575,[393]5.3573,[394]5.3604,[395]5.3595,[396]5.3542,[397]5.3609,[398]5.3652,[399]5.3722,[400]5.3713,[401]5.3718,[402]5.3730,[403]5.3755,[404]5.3808,[405]5.3658,[406]5.3621,[407]5.3610,[408]5.3620,[409]5.3733,[410]5.3826,[411]5.3923,[412]5.4064,[413]5.4165,[414]5.4232,[415]5.4294,[416]5.4367,[417]5.4464,[418]5.4486,[419]5.4536,[420]5.4616,[421]5.4715,[422]5.4750,[423]5.4805,[424]5.4898,[425]5.4975,[426]5.5039,[427]5.5079,[428]5.5154,[429]5.5193,[430]5.5253,[431]5.5381,[432]5.5412,[433]5.5405,[434]5.5370,[435]5.5381,[436]5.5409,[437]5.5492,[438]5.5565,[439]5.5538,[440]5.5530,[441]5.5485,[442]5.5475,[443]5.5485,[444]5.5500,[445]5.5493,[446]5.5512,[447]5.5537,[448]5.5567,[449]5.5551,[450]5.5561,[451]5.5532,[452]5.5377,[453]5.5286,[454]5.5231,[455]5.5236,[456]5.5276,[457]5.5289,[458]5.5270,[459]5.5266,[460]5.5340,[461]5.5303,[462]5.5263,[463]5.5246,[464]5.5241,[465]5.5217,[466]5.5141,[467]5.5130,[468]5.5110,[469]5.5120,[470]5.5109,[471]5.5061,[472]5.5068,[473]5.5021,[474]5.5013,[475]5.4944,[476]5.4921,[477]5.4838,[478]5.4812,[479]5.4817,[480]5.4845,[481]5.4848,[482]5.4800,[483]5.4760,[484]5.4772,[485]5.4707,[486]5.4642,[487]5.4635,[488]5.4607,[489]5.4555,[490]5.4523,[491]5.4489,[492]5.4425,[493]5.4393,[494]5.4377,[495]5.4357,[496]5.4320,[497]5.4259,[498]5.4235,[499]5.4200,[500]5.4119,[501]5.4049,[502]5.4037,[503]5.4026,[504]5.3948,[505]5.3949,[506]5.3955,[507]5.3901,[508]5.3862,[509]5.3865,[510]5.3889,[511]5.3932,[512]5.3973,[513]5.3999,[514]5.4054,[515]5.4012,[516]5.4001,[517]5.4001,[518]5.4002,[519]5.4022,[520]5.4034,[521]5.4044,[522]5.4058,[523]5.4065,[524]5.4120,[525]5.4149,[526]5.4152,[527]5.4169,[528]5.4113,[529]5.4126,[530]5.4087,[531]5.4081,[532]5.4130,[533]5.4158,[534]5.4138,[535]5.4161,[536]5.4116,[537]5.4098,[538]5.4149,[539]5.4156,[540]5.4174,[541]5.4173,[542]5.4186,[543]5.4209,[544]5.4221,[545]5.4211,[546]5.4213,[547]5.4179,[548]5.4139,[549]5.4140,[550]5.4120,[551]5.4092,[552]5.4070,[553]5.4040,[554]5.4016,[555]5.3997,[556]5.3990,[557]5.4009,[558]5.3976,[559]5.3977,[560]5.3964,[561]5.3963,[562]5.3936,[563]5.3935,[564]5.3978,[565]5.3990,[566]5.3994,[567]5.3976,[568]5.3986,[569]5.3970,[570]5.3996,[571]5.4008,[572]5.4018,[573]5.4022,[574]5.3993,[575]5.3973,[576]5.3965,[577]5.3948,[578]5.3928,[579]5.3925,[580]5.3872,[581]5.3841,[582]5.3844,[583]5.3853,[584]5.3857,[585]5.3797,[586]5.3743,[587]5.3742,[588]5.3787,[589]5.3837,[590]5.3866,[591]5.3883,[592]5.3871,[593]5.3834,[594]5.3845,[595]5.3828,[596]5.3869,[597]5.3851,[598]5.3819,[599]5.3845,[600]5.3838,[601]5.3828,[602]5.3830,[603]5.3859,[604]5.3864,[605]5.3888,[606]5.3901,[607]5.3886,[608]5.3856,[609]5.3866,[610]5.3908,[611]5.3896,[612]5.3918,[613]5.3889,[614]5.3853,[615]5.3795,[616]5.3822,[617]5.3772,[618]5.3729,[619]5.3685,[620]5.3574,[621]5.3522,[622]5.3503,[623]5.3519,[624]5.3523,[625]5.3531,[626]5.3529,[627]5.3557,[628]5.3564,[629]5.3568,[630]5.3597,[631]5.3642,[632]5.3689,[633]5.3676,[634]5.3705,[635]5.3702,[636]5.3670,[637]5.3630,[638]5.3652,[639]5.3621,[640]5.3627,[641]5.3630,[642]5.3681,[643]5.3699,[644]5.3715,[645]5.3699,[646]5.3732,[647]5.3682,[648]5.3693,[649]5.3694,[650]5.3725,[651]5.3766,[652]5.3771,[653]5.3809,[654]5.3756,[655]5.3748,
llama_print_timings: load time = 5440.66 ms
llama_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: prompt eval time = 1718144.60 ms / 335360 tokens ( 5.12 ms per token)
llama_print_timings: eval time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: total time = 1748887.76 ms
Q4_2 with "RMSE optimized"
main: seed = 1682201097
llama.cpp: loading model from ../models/13B/ggml-model-q4_2-rmse.bin
llama_model_load_internal: format = ggjt v1 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 512
llama_model_load_internal: n_embd = 5120
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 40
llama_model_load_internal: n_layer = 40
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 5 (mostly Q4_2)
llama_model_load_internal: n_ff = 13824
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size = 7945693.73 KB
llama_model_load_internal: mem required = 9807.47 MB (+ 1608.00 MB per state)
....................................................................................................
llama_init_from_file: kv self size = 400.00 MB
system_info: n_threads = 12 / 32 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
perplexity : calculating perplexity over 655 chunks, batch_size=512
2.82 seconds per pass - ETA 30 minutes
[1]3.7974,[2]4.2646,[3]5.0665,[4]5.4543,[5]5.6326,[6]5.5928,[7]5.7421,[8]5.8371,[9]6.0964,[10]6.3187,[11]6.5080,[12]6.5609,[13]6.5207,[14]6.6042,[15]6.8046,[16]6.4790,[17]6.3914,[18]6.3660,[19]6.0720,[20]6.0501,[21]5.9758,[22]5.7924,[23]5.7621,[24]5.6706,[25]5.6831,[26]5.5311,[27]5.3560,[28]5.2582,[29]5.1838,[30]5.0450,[31]5.0052,[32]5.0178,[33]4.9758,[34]5.0188,[35]5.0365,[36]5.0591,[37]5.0538,[38]5.0496,[39]5.0775,[40]5.1186,[41]5.1421,[42]5.1784,[43]5.1414,[44]5.1840,[45]5.1856,[46]5.1589,[47]5.1854,[48]5.1682,[49]5.1724,[50]5.1414,[51]5.1498,[52]5.1432,[53]5.1875,[54]5.1763,[55]5.1584,[56]5.1804,[57]5.1999,[58]5.2217,[59]5.2392,[60]5.2770,[61]5.2700,[62]5.3267,[63]5.3517,[64]5.3616,[65]5.3996,[66]5.3996,[67]5.4174,[68]5.4289,[69]5.4560,[70]5.4870,[71]5.5101,[72]5.5452,[73]5.5929,[74]5.6000,[75]5.6119,[76]5.6283,[77]5.6403,[78]5.6268,[79]5.6534,[80]5.6487,[81]5.6577,[82]5.6540,[83]5.6078,[84]5.5956,[85]5.5891,[86]5.5727,[87]5.5070,[88]5.4642,[89]5.4428,[90]5.4324,[91]5.4546,[92]5.4494,[93]5.4503,[94]5.4495,[95]5.4757,[96]5.4720,[97]5.4691,[98]5.4658,[99]5.4583,[100]5.4552,[101]5.4783,[102]5.4742,[103]5.4898,[104]5.4943,[105]5.4960,[106]5.5107,[107]5.5087,[108]5.5238,[109]5.5224,[110]5.5169,[111]5.5355,[112]5.5520,[113]5.5516,[114]5.5500,[115]5.5548,[116]5.5428,[117]5.5427,[118]5.5667,[119]5.5846,[120]5.6138,[121]5.6292,[122]5.6510,[123]5.6887,[124]5.7060,[125]5.7007,[126]5.7363,[127]5.7699,[128]5.7984,[129]5.7862,[130]5.7942,[131]5.7893,[132]5.7863,[133]5.7742,[134]5.7823,[135]5.7825,[136]5.7738,[137]5.7704,[138]5.7562,[139]5.7487,[140]5.7473,[141]5.7192,[142]5.7152,[143]5.6897,[144]5.6749,[145]5.6660,[146]5.6552,[147]5.6600,[148]5.6620,[149]5.6580,[150]5.6568,[151]5.6613,[152]5.6558,[153]5.6464,[154]5.6401,[155]5.6459,[156]5.6439,[157]5.6586,[158]5.6600,[159]5.6606,[160]5.6639,[161]5.6753,[162]5.6498,[163]5.6402,[164]5.6198,[165]5.5952,[166]5.5720,[167]5.5400,[168]5.5125,[169]5.4989,[170]5.4899,[171]5.4696,[172]5.4573,[173]5.4441,[174]5.4177,[175]5.3979,[176]5.3846,[177]5.3679,[178]5.3480,[179]5.3347,[180]5.3271,[181]5.3106,[182]5.2948,[183]5.2828,[184]5.2818,[185]5.2745,[186]5.2757,[187]5.2811,[188]5.2784,[189]5.2949,[190]5.2950,[191]5.3120,[192]5.3254,[193]5.3404,[194]5.3512,[195]5.3707,[196]5.3817,[197]5.4010,[198]5.4146,[199]5.4166,[200]5.4169,[201]5.4101,[202]5.4222,[203]5.4281,[204]5.4239,[205]5.4333,[206]5.4383,[207]5.4346,[208]5.4401,[209]5.4432,[210]5.4486,[211]5.4589,[212]5.4650,[213]5.4740,[214]5.4774,[215]5.4808,[216]5.4929,[217]5.5096,[218]5.5228,[219]5.5222,[220]5.5189,[221]5.5145,[222]5.5149,[223]5.5084,[224]5.5018,[225]5.4987,[226]5.5186,[227]5.5245,[228]5.5314,[229]5.5384,[230]5.5345,[231]5.5500,[232]5.5399,[233]5.5251,[234]5.5108,[235]5.4892,[236]5.4838,[237]5.4751,[238]5.4784,[239]5.4671,[240]5.4581,[241]5.4612,[242]5.4628,[243]5.4617,[244]5.4518,[245]5.4480,[246]5.4379,[247]5.4277,[248]5.4214,[249]5.4182,[250]5.4218,[251]5.4138,[252]5.4086,[253]5.3996,[254]5.3949,[255]5.3856,[256]5.3694,[257]5.3592,[258]5.3526,[259]5.3520,[260]5.3437,[261]5.3393,[262]5.3353,[263]5.3307,[264]5.3082,[265]5.3083,[266]5.3054,[267]5.2992,[268]5.3056,[269]5.3049,[270]5.3058,[271]5.3119,[272]5.3145,[273]5.3162,[274]5.3174,[275]5.3236,[276]5.3291,[277]5.3414,[278]5.3498,[279]5.3580,[280]5.3615,[281]5.3713,[282]5.3766,[283]5.3891,[284]5.3978,[285]5.4056,[286]5.4176,[287]5.4145,[288]5.4202,[289]5.4138,[290]5.3997,[291]5.3867,[292]5.3732,[293]5.3612,[294]5.3617,[295]5.3615,[296]5.3662,[297]5.3651,[298]5.3672,[299]5.3648,[300]5.3561,[301]5.3565,[302]5.3502,[303]5.3419,[304]5.3347,[305]5.3321,[306]5.3217,[307]5.3243,[308]5.3253,[309]5.3121,[310]5.3090,[311]5.3051,[312]5.3067,[313]5.3007,[314]5.2989,[315]5.2859,[316]5.2826,[317]5.2702,[318]5.2541,[319]5.2644,[320]5.2756,[321]5.2798,[322]5.2765,[323]5.2701,[324]5.2682,[325]5.2778,[326]5.2793,[327]5.2801,[328]5.2837,[329]5.2886,[330]5.2904,[331]5.3003,[332]5.2967,[333]5.3041,[334]5.2994,[335]5.2942,[336]5.2966,[337]5.2955,[338]5.2950,[339]5.2905,[340]5.2879,[341]5.2944,[342]5.2976,[343]5.3016,[344]5.3019,[345]5.3031,[346]5.3017,[347]5.3049,[348]5.3088,[349]5.3105,[350]5.3083,[351]5.3096,[352]5.3097,[353]5.3047,[354]5.3053,[355]5.3101,[356]5.3130,[357]5.3100,[358]5.3183,[359]5.3204,[360]5.3169,[361]5.3166,[362]5.3235,[363]5.3343,[364]5.3396,[365]5.3436,[366]5.3453,[367]5.3538,[368]5.3515,[369]5.3529,[370]5.3550,[371]5.3511,[372]5.3558,[373]5.3599,[374]5.3579,[375]5.3574,[376]5.3630,[377]5.3598,[378]5.3624,[379]5.3664,[380]5.3598,[381]5.3571,[382]5.3529,[383]5.3510,[384]5.3510,[385]5.3499,[386]5.3487,[387]5.3484,[388]5.3451,[389]5.3414,[390]5.3360,[391]5.3301,[392]5.3264,[393]5.3259,[394]5.3289,[395]5.3282,[396]5.3230,[397]5.3294,[398]5.3338,[399]5.3407,[400]5.3400,[401]5.3408,[402]5.3420,[403]5.3443,[404]5.3496,[405]5.3342,[406]5.3301,[407]5.3290,[408]5.3302,[409]5.3416,[410]5.3508,[411]5.3605,[412]5.3747,[413]5.3847,[414]5.3911,[415]5.3972,[416]5.4047,[417]5.4145,[418]5.4168,[419]5.4218,[420]5.4298,[421]5.4394,[422]5.4428,[423]5.4487,[424]5.4579,[425]5.4656,[426]5.4720,[427]5.4762,[428]5.4835,[429]5.4871,[430]5.4934,[431]5.5062,[432]5.5095,[433]5.5085,[434]5.5052,[435]5.5065,[436]5.5093,[437]5.5174,[438]5.5249,[439]5.5223,[440]5.5217,[441]5.5172,[442]5.5159,[443]5.5174,[444]5.5194,[445]5.5187,[446]5.5207,[447]5.5230,[448]5.5261,[449]5.5245,[450]5.5256,[451]5.5226,[452]5.5073,[453]5.4981,[454]5.4928,[455]5.4932,[456]5.4970,[457]5.4984,[458]5.4967,[459]5.4963,[460]5.5037,[461]5.4998,[462]5.4959,[463]5.4946,[464]5.4942,[465]5.4919,[466]5.4845,[467]5.4836,[468]5.4820,[469]5.4832,[470]5.4824,[471]5.4775,[472]5.4784,[473]5.4736,[474]5.4727,[475]5.4658,[476]5.4636,[477]5.4554,[478]5.4528,[479]5.4530,[480]5.4557,[481]5.4558,[482]5.4511,[483]5.4469,[484]5.4479,[485]5.4411,[486]5.4348,[487]5.4338,[488]5.4310,[489]5.4258,[490]5.4225,[491]5.4188,[492]5.4122,[493]5.4094,[494]5.4080,[495]5.4060,[496]5.4021,[497]5.3958,[498]5.3933,[499]5.3899,[500]5.3817,[501]5.3746,[502]5.3736,[503]5.3725,[504]5.3647,[505]5.3648,[506]5.3655,[507]5.3600,[508]5.3564,[509]5.3568,[510]5.3590,[511]5.3631,[512]5.3671,[513]5.3696,[514]5.3750,[515]5.3710,[516]5.3699,[517]5.3697,[518]5.3698,[519]5.3718,[520]5.3729,[521]5.3741,[522]5.3754,[523]5.3761,[524]5.3814,[525]5.3843,[526]5.3846,[527]5.3860,[528]5.3806,[529]5.3815,[530]5.3776,[531]5.3771,[532]5.3820,[533]5.3846,[534]5.3827,[535]5.3849,[536]5.3806,[537]5.3789,[538]5.3839,[539]5.3847,[540]5.3868,[541]5.3868,[542]5.3880,[543]5.3901,[544]5.3915,[545]5.3905,[546]5.3910,[547]5.3877,[548]5.3837,[549]5.3839,[550]5.3820,[551]5.3793,[552]5.3773,[553]5.3743,[554]5.3719,[555]5.3700,[556]5.3693,[557]5.3712,[558]5.3679,[559]5.3682,[560]5.3669,[561]5.3670,[562]5.3643,[563]5.3642,[564]5.3683,[565]5.3694,[566]5.3699,[567]5.3680,[568]5.3690,[569]5.3675,[570]5.3702,[571]5.3714,[572]5.3723,[573]5.3727,[574]5.3699,[575]5.3682,[576]5.3677,[577]5.3661,[578]5.3643,[579]5.3642,[580]5.3591,[581]5.3560,[582]5.3562,[583]5.3570,[584]5.3575,[585]5.3517,[586]5.3464,[587]5.3463,[588]5.3507,[589]5.3556,[590]5.3585,[591]5.3603,[592]5.3591,[593]5.3554,[594]5.3566,[595]5.3550,[596]5.3592,[597]5.3573,[598]5.3541,[599]5.3567,[600]5.3558,[601]5.3547,[602]5.3549,[603]5.3577,[604]5.3582,[605]5.3606,[606]5.3618,[607]5.3602,[608]5.3575,[609]5.3583,[610]5.3623,[611]5.3611,[612]5.3633,[613]5.3607,[614]5.3570,[615]5.3512,[616]5.3537,[617]5.3487,[618]5.3444,[619]5.3400,[620]5.3291,[621]5.3239,[622]5.3220,[623]5.3233,[624]5.3239,[625]5.3246,[626]5.3242,[627]5.3270,[628]5.3278,[629]5.3284,[630]5.3314,[631]5.3359,[632]5.3406,[633]5.3394,[634]5.3423,[635]5.3419,[636]5.3384,[637]5.3346,[638]5.3369,[639]5.3337,[640]5.3343,[641]5.3347,[642]5.3398,[643]5.3415,[644]5.3432,[645]5.3418,[646]5.3454,[647]5.3404,[648]5.3415,[649]5.3416,[650]5.3448,[651]5.3489,[652]5.3493,[653]5.3531,[654]5.3476,[655]5.3468,
llama_print_timings: load time = 6768.59 ms
llama_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: prompt eval time = 1761526.86 ms / 335360 tokens ( 5.25 ms per token)
llama_print_timings: eval time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: total time = 1792971.87 ms
Q4_2 with "Full Range"
main: seed = 1682199294
llama.cpp: loading model from ../models/13B/ggml-model-q4_2-rf.bin
llama_model_load_internal: format = ggjt v1 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 512
llama_model_load_internal: n_embd = 5120
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 40
llama_model_load_internal: n_layer = 40
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 5 (mostly Q4_2)
llama_model_load_internal: n_ff = 13824
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size = 7945693.73 KB
llama_model_load_internal: mem required = 9807.47 MB (+ 1608.00 MB per state)
....................................................................................................
llama_init_from_file: kv self size = 400.00 MB
system_info: n_threads = 12 / 32 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
perplexity : calculating perplexity over 655 chunks, batch_size=512
2.83 seconds per pass - ETA 30 minutes
[1]3.7458,[2]4.2361,[3]5.0416,[4]5.4255,[5]5.6009,[6]5.5469,[7]5.6889,[8]5.7965,[9]6.0661,[10]6.2818,[11]6.4702,[12]6.5206,[13]6.4869,[14]6.5784,[15]6.7809,[16]6.4660,[17]6.3873,[18]6.3628,[19]6.0707,[20]6.0525,[21]5.9797,[22]5.8006,[23]5.7732,[24]5.6818,[25]5.6933,[26]5.5435,[27]5.3662,[28]5.2687,[29]5.1951,[30]5.0582,[31]5.0202,[32]5.0342,[33]4.9912,[34]5.0337,[35]5.0518,[36]5.0735,[37]5.0668,[38]5.0661,[39]5.0930,[40]5.1358,[41]5.1586,[42]5.1941,[43]5.1576,[44]5.1991,[45]5.2018,[46]5.1749,[47]5.2047,[48]5.1886,[49]5.1917,[50]5.1601,[51]5.1686,[52]5.1600,[53]5.2046,[54]5.1938,[55]5.1742,[56]5.1950,[57]5.2135,[58]5.2368,[59]5.2544,[60]5.2907,[61]5.2846,[62]5.3398,[63]5.3645,[64]5.3742,[65]5.4106,[66]5.4098,[67]5.4283,[68]5.4393,[69]5.4650,[70]5.4953,[71]5.5180,[72]5.5526,[73]5.6000,[74]5.6081,[75]5.6182,[76]5.6333,[77]5.6433,[78]5.6294,[79]5.6561,[80]5.6507,[81]5.6596,[82]5.6566,[83]5.6099,[84]5.5987,[85]5.5914,[86]5.5748,[87]5.5089,[88]5.4663,[89]5.4450,[90]5.4355,[91]5.4564,[92]5.4526,[93]5.4533,[94]5.4518,[95]5.4781,[96]5.4749,[97]5.4716,[98]5.4688,[99]5.4604,[100]5.4569,[101]5.4801,[102]5.4758,[103]5.4914,[104]5.4960,[105]5.4969,[106]5.5106,[107]5.5085,[108]5.5239,[109]5.5231,[110]5.5174,[111]5.5362,[112]5.5533,[113]5.5529,[114]5.5512,[115]5.5553,[116]5.5436,[117]5.5427,[118]5.5660,[119]5.5845,[120]5.6131,[121]5.6295,[122]5.6509,[123]5.6878,[124]5.7055,[125]5.7005,[126]5.7364,[127]5.7694,[128]5.7977,[129]5.7859,[130]5.7945,[131]5.7896,[132]5.7867,[133]5.7743,[134]5.7823,[135]5.7825,[136]5.7732,[137]5.7692,[138]5.7552,[139]5.7471,[140]5.7455,[141]5.7181,[142]5.7137,[143]5.6884,[144]5.6732,[145]5.6643,[146]5.6535,[147]5.6587,[148]5.6613,[149]5.6574,[150]5.6566,[151]5.6611,[152]5.6553,[153]5.6454,[154]5.6391,[155]5.6456,[156]5.6433,[157]5.6589,[158]5.6608,[159]5.6613,[160]5.6649,[161]5.6758,[162]5.6504,[163]5.6412,[164]5.6212,[165]5.5961,[166]5.5733,[167]5.5412,[168]5.5137,[169]5.5004,[170]5.4921,[171]5.4713,[172]5.4587,[173]5.4463,[174]5.4196,[175]5.3997,[176]5.3863,[177]5.3695,[178]5.3496,[179]5.3363,[180]5.3293,[181]5.3131,[182]5.2970,[183]5.2849,[184]5.2839,[185]5.2763,[186]5.2775,[187]5.2833,[188]5.2806,[189]5.2965,[190]5.2968,[191]5.3138,[192]5.3274,[193]5.3421,[194]5.3529,[195]5.3725,[196]5.3837,[197]5.4028,[198]5.4162,[199]5.4180,[200]5.4186,[201]5.4122,[202]5.4247,[203]5.4301,[204]5.4256,[205]5.4347,[206]5.4397,[207]5.4360,[208]5.4420,[209]5.4453,[210]5.4510,[211]5.4615,[212]5.4677,[213]5.4767,[214]5.4794,[215]5.4829,[216]5.4953,[217]5.5115,[218]5.5250,[219]5.5242,[220]5.5208,[221]5.5161,[222]5.5162,[223]5.5096,[224]5.5026,[225]5.4996,[226]5.5193,[227]5.5241,[228]5.5314,[229]5.5382,[230]5.5343,[231]5.5501,[232]5.5394,[233]5.5247,[234]5.5101,[235]5.4883,[236]5.4825,[237]5.4736,[238]5.4770,[239]5.4656,[240]5.4568,[241]5.4598,[242]5.4614,[243]5.4603,[244]5.4505,[245]5.4471,[246]5.4371,[247]5.4270,[248]5.4205,[249]5.4173,[250]5.4209,[251]5.4129,[252]5.4079,[253]5.3989,[254]5.3943,[255]5.3852,[256]5.3688,[257]5.3587,[258]5.3521,[259]5.3516,[260]5.3431,[261]5.3382,[262]5.3343,[263]5.3297,[264]5.3069,[265]5.3070,[266]5.3043,[267]5.2980,[268]5.3045,[269]5.3038,[270]5.3046,[271]5.3109,[272]5.3137,[273]5.3150,[274]5.3164,[275]5.3223,[276]5.3281,[277]5.3406,[278]5.3492,[279]5.3573,[280]5.3608,[281]5.3706,[282]5.3758,[283]5.3884,[284]5.3969,[285]5.4047,[286]5.4171,[287]5.4136,[288]5.4191,[289]5.4126,[290]5.3986,[291]5.3856,[292]5.3723,[293]5.3603,[294]5.3607,[295]5.3609,[296]5.3656,[297]5.3645,[298]5.3665,[299]5.3641,[300]5.3553,[301]5.3555,[302]5.3492,[303]5.3412,[304]5.3339,[305]5.3311,[306]5.3203,[307]5.3228,[308]5.3235,[309]5.3103,[310]5.3073,[311]5.3032,[312]5.3045,[313]5.2991,[314]5.2975,[315]5.2847,[316]5.2816,[317]5.2689,[318]5.2529,[319]5.2631,[320]5.2746,[321]5.2789,[322]5.2756,[323]5.2693,[324]5.2675,[325]5.2768,[326]5.2784,[327]5.2792,[328]5.2828,[329]5.2879,[330]5.2897,[331]5.3000,[332]5.2963,[333]5.3039,[334]5.2991,[335]5.2940,[336]5.2963,[337]5.2954,[338]5.2948,[339]5.2905,[340]5.2881,[341]5.2948,[342]5.2978,[343]5.3021,[344]5.3023,[345]5.3036,[346]5.3021,[347]5.3056,[348]5.3094,[349]5.3114,[350]5.3093,[351]5.3107,[352]5.3107,[353]5.3057,[354]5.3060,[355]5.3107,[356]5.3136,[357]5.3104,[358]5.3185,[359]5.3206,[360]5.3171,[361]5.3166,[362]5.3234,[363]5.3343,[364]5.3395,[365]5.3433,[366]5.3453,[367]5.3539,[368]5.3517,[369]5.3531,[370]5.3552,[371]5.3512,[372]5.3560,[373]5.3601,[374]5.3582,[375]5.3578,[376]5.3634,[377]5.3600,[378]5.3627,[379]5.3666,[380]5.3600,[381]5.3571,[382]5.3529,[383]5.3509,[384]5.3510,[385]5.3498,[386]5.3486,[387]5.3483,[388]5.3451,[389]5.3414,[390]5.3361,[391]5.3305,[392]5.3269,[393]5.3265,[394]5.3297,[395]5.3289,[396]5.3236,[397]5.3303,[398]5.3345,[399]5.3416,[400]5.3408,[401]5.3414,[402]5.3426,[403]5.3451,[404]5.3505,[405]5.3353,[406]5.3312,[407]5.3301,[408]5.3312,[409]5.3423,[410]5.3515,[411]5.3612,[412]5.3754,[413]5.3857,[414]5.3922,[415]5.3981,[416]5.4055,[417]5.4150,[418]5.4172,[419]5.4221,[420]5.4301,[421]5.4399,[422]5.4433,[423]5.4489,[424]5.4581,[425]5.4657,[426]5.4720,[427]5.4760,[428]5.4834,[429]5.4873,[430]5.4934,[431]5.5062,[432]5.5094,[433]5.5085,[434]5.5052,[435]5.5066,[436]5.5093,[437]5.5176,[438]5.5251,[439]5.5224,[440]5.5217,[441]5.5172,[442]5.5160,[443]5.5173,[444]5.5191,[445]5.5184,[446]5.5205,[447]5.5229,[448]5.5260,[449]5.5244,[450]5.5254,[451]5.5224,[452]5.5072,[453]5.4979,[454]5.4923,[455]5.4928,[456]5.4966,[457]5.4979,[458]5.4959,[459]5.4954,[460]5.5028,[461]5.4991,[462]5.4953,[463]5.4933,[464]5.4929,[465]5.4906,[466]5.4831,[467]5.4818,[468]5.4800,[469]5.4810,[470]5.4800,[471]5.4749,[472]5.4756,[473]5.4709,[474]5.4701,[475]5.4631,[476]5.4607,[477]5.4526,[478]5.4501,[479]5.4506,[480]5.4534,[481]5.4536,[482]5.4490,[483]5.4449,[484]5.4458,[485]5.4390,[486]5.4326,[487]5.4317,[488]5.4290,[489]5.4238,[490]5.4203,[491]5.4168,[492]5.4102,[493]5.4071,[494]5.4055,[495]5.4035,[496]5.3996,[497]5.3934,[498]5.3908,[499]5.3873,[500]5.3793,[501]5.3722,[502]5.3711,[503]5.3700,[504]5.3623,[505]5.3624,[506]5.3630,[507]5.3575,[508]5.3538,[509]5.3543,[510]5.3565,[511]5.3608,[512]5.3648,[513]5.3672,[514]5.3727,[515]5.3686,[516]5.3675,[517]5.3674,[518]5.3676,[519]5.3697,[520]5.3708,[521]5.3718,[522]5.3732,[523]5.3739,[524]5.3794,[525]5.3822,[526]5.3825,[527]5.3840,[528]5.3785,[529]5.3795,[530]5.3757,[531]5.3754,[532]5.3803,[533]5.3830,[534]5.3811,[535]5.3834,[536]5.3790,[537]5.3772,[538]5.3821,[539]5.3828,[540]5.3847,[541]5.3845,[542]5.3858,[543]5.3880,[544]5.3893,[545]5.3882,[546]5.3885,[547]5.3852,[548]5.3811,[549]5.3812,[550]5.3792,[551]5.3765,[552]5.3744,[553]5.3714,[554]5.3692,[555]5.3673,[556]5.3666,[557]5.3683,[558]5.3650,[559]5.3652,[560]5.3639,[561]5.3639,[562]5.3612,[563]5.3609,[564]5.3652,[565]5.3663,[566]5.3668,[567]5.3649,[568]5.3658,[569]5.3643,[570]5.3670,[571]5.3682,[572]5.3691,[573]5.3695,[574]5.3666,[575]5.3647,[576]5.3640,[577]5.3623,[578]5.3604,[579]5.3602,[580]5.3549,[581]5.3518,[582]5.3519,[583]5.3527,[584]5.3533,[585]5.3474,[586]5.3419,[587]5.3419,[588]5.3463,[589]5.3512,[590]5.3541,[591]5.3556,[592]5.3546,[593]5.3507,[594]5.3520,[595]5.3505,[596]5.3547,[597]5.3530,[598]5.3497,[599]5.3522,[600]5.3513,[601]5.3502,[602]5.3502,[603]5.3531,[604]5.3536,[605]5.3559,[606]5.3572,[607]5.3557,[608]5.3529,[609]5.3538,[610]5.3578,[611]5.3567,[612]5.3590,[613]5.3561,[614]5.3524,[615]5.3467,[616]5.3496,[617]5.3446,[618]5.3404,[619]5.3360,[620]5.3252,[621]5.3201,[622]5.3182,[623]5.3196,[624]5.3201,[625]5.3209,[626]5.3206,[627]5.3233,[628]5.3240,[629]5.3245,[630]5.3275,[631]5.3320,[632]5.3367,[633]5.3355,[634]5.3385,[635]5.3382,[636]5.3349,[637]5.3311,[638]5.3332,[639]5.3301,[640]5.3308,[641]5.3311,[642]5.3362,[643]5.3379,[644]5.3397,[645]5.3383,[646]5.3417,[647]5.3367,[648]5.3378,[649]5.3380,[650]5.3410,[651]5.3451,[652]5.3455,[653]5.3493,[654]5.3439,[655]5.3433,
llama_print_timings: load time = 6786.22 ms
llama_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: prompt eval time = 1769291.10 ms / 335360 tokens ( 5.28 ms per token)
llama_print_timings: eval time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: total time = 1802705.78 ms
@ggerganov Rebased #1106 on latest master, re-quantized, started a Q4_0
perplexity run. My RMSE-optimized results look completely different from what you report for #1106 above. At this point I'm here: [120]6.4905,
, so more than 0.08
lower compared to the above.
@ggerganov , you are right. The approach in this PR (full range) really is better for perplexity than RMSE minimization as per #1106 for Q4_0
and Q4_2
. It is also better than mean absolute difference (MAD) minimization, as well as for a constant scaling factor different from -8 / max
used here, as suggested elsewhere by @sw (well, at least for for scaling factors in the -8.5 / max ... -7.5 /max
range, which I checked in steps of 0.1 / max
by running perplexity up to 150 chunks. From these runs it seems unlikely going outside of this range will help, as perplexity increases as we move away from -8 / max
).
However, for Q4_1
and Q4_3
, where "full range" is equivalent to the reference implementation, RMSE-minimization is better than any of the above. Both, reference and RMSE-minimized Q4_1
, are quite a bit better than Q4_2
(and of course Q4_0
), so it seems it is important to take the asymmetry of the model weights into account when quantizing.
@ikawrakow that is surprising to me, so far the RMSE has mapped fairly well to the performance of different quantization methods. Very interesting result!
I think there is a good argument to use this method as a default then.
Besides measuring actual activations like GPTQ, one other error measure I've seen mentioned, is cosine similarity E.g. https://proceedings.neurips.cc/paper/2018/file/e82c4b19b8151ddc25d4d93baf7b908f-Paper.pdf says
It is well known that when batch norm is applied after a convolution layer, the output is invariant to the norm of the weight on the proceeding layer [11] i.e., BN(C · W · x) = BN(W · x) for any given constant C. This quantity is often described geometrically as the norm of the weight tensor, and in the presence of this invariance, the only measure that needs to be preserved upon quantization is the directionality of the weight tensor. In the following we show that quantization preserves the direction (angle) of high-dimensional vectors when W follows a Gaussian distribution. More specifically, for networks with M-bit fixed point representation, the angle is preserved when the number of quantization levels 2M is much larger than p 2 ln(N), where N is the size of quantized vector. This shows that significant quantization is possible on practical settings. Taking for example the dimensionality of the joint product in a batch with 1024 examples corresponding to the last layer of ResNet-50, we need no more than 8-bit of precision to preserve the angle well (i.e., p 2 ln(3 · 3 · 2048 · 1024) = 5.7 << 2 8 ). We stress that this result heavily relays on values being distributed according to a Gaussian distribution, and suggests why some vectors are robust to quantization (e.g., weights and activations) while others are more fragile (e.g., gradients).
It might make sense that if it's most important to preserve the angle, the largest value affects that the most and should be preserved (and q4_1 preserves the largest positive and negative value). The "tie break" @sw tested when there are negative and positive values of the same magnitude might still be beneficial then.
@unbounded Thanks for the suggestion, but the cosine similarity cannot be used to define the scale of a bucket (it cancels out). Else I would have picked that (or the closely related correlation coefficient). We are quantizing "buckets", not whole layers, and my guess would be that activations in the next layer do depend on the relative magnitude of the quantized weights in the buckets (although independent on the overall scale of the layer as per the paper you linked). What has been done so far here in llama.cpp
is just that, quantize "buckets" independently from each other (at least this is what I have seen). It is very likely that in order to improve compared to Q4_3
, one needs to either use more bits, or go beyond independently quantizing "buckets" of weights. If I pick a random layer and look at it in ImageJ
, my guess is that something like SVD should be able to do a good job representing this data with a fraction of the bits.
But yes, I agree, RMSE-minimization being worse than "full range" was a surprise for me too. My simple-minded explanation is that a) the asymmetry around 0 does matter (as we see by the much better results for Q4_1
and Q4_3
) and b) when we are ignoring the asymmetry, it is more important to put the max (or min) of the quantized weights at the same spot as the actual max (min), than it is to find some kind of a best overall match.
Here as an example layers.0.attention.wk.weight
@unbounded
Lets finalize this PR, including Q4_2
, and merge it.
Let me know if you don't have the time and I will finish it.
Also, for now disable the Q4_2
RMSE optimization here and use the reference:
- https://github.com/ggerganov/llama.cpp/blob/957c8ae21d1e7052ea45a40ee8c0407b909e90cc/ggml.c#L1810
- https://github.com/ggerganov/llama.cpp/blob/957c8ae21d1e7052ea45a40ee8c0407b909e90cc/ggml.c#L12142-L12143
- https://github.com/ggerganov/llama.cpp/blob/957c8ae21d1e7052ea45a40ee8c0407b909e90cc/ggml.c#L1257-L1259
Updated for q4_2
Thanks
Only need to decide if we bump the version and print a warning message for Q4_0
v1 models to update in order to get better results, or we just silently merge since Q4_2
is better anyway and we expect when finalized everyone to start using it over Q4_0
?
P.S. I think the latter is better, but want to hear if there are other opinions
I feel like a version bump is unnecessary at this point, if people have a model they're happy with we don't need to prod them to upgrade. If there's a version bump in the future, maybe we could add a metadata header where information like this can be added.
So, one can indeed improve Q4_0
and Q4_2
via rmse minimization. One just needs weighted minimization with higher importance given to model weights with larger absolute values. The perplexity delta is not earth-shattering (-0.02
for Q4_0
, -0.01
for Q4_2
), so probably not worth it, especially considering that Q4_1
and Q4_3
give significantly lower perplexity compared to Q4_0
and Q4_2
. But this confirms my hypothesis above (when ignoring the asymmetry in the distribution of the model weights as in Q4_0
and Q4_2
, it is more important to better match higher absolute value weights than it is to achieve a better "overall" match within a bucket).
To follow up on the second hypothesis I made above (one needs more bits to match fp16
model perplexity), a 5-bit model ala Q4_1
achieves a 7B perplexity of 5.9769
with output.weight
tensor quantization, and 5.9721
with output.weights
not quantized (these perplexity results are with rmse minimization. The perplexity without rmse minimization is 5.9935
). Such a model is exactly the same size as Q4_1
and Q4_3
(using "buckets" with 32 values and two fp16
floats per bucket, so effectively 6 bits per model weight). It runs at the same speed for perplexity calculations, and is only slightly slower for token evaluation (~55 ms/token on my M2 Mac). Interesting to note that the difference between quantizing and not quantizing the output.weight
tensor is now much smaller. This kind of makes sense because the 5-bit quantization error of this tensor is much smaller, so the gain from not quantizing it is much less. For the 13B model we get perplexity of 5.2726
with quantized output.weight
tensor, so the delta to the full model shrinks to 0.027
.
My third hypothesis (one can accurately represent model weights via SVD decomposition using fewer bits) seems to be incorrect in general. It does work pretty well for the attention related tensors of the first layer, but that's about it.