llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

Add AVX2 implementation of quantize_row_q4_1

Open slaren opened this issue 1 year ago • 7 comments

Largely based on the AVX2 implementation of quantize_row_q4_0.

Run on (16 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x8)
  L1 Instruction 32 KiB (x8)
  L2 Unified 256 KiB (x8)
  L3 Unified 16384 KiB (x1)
Load Average: 0.17, 1.04, 1.50
-------------------------------------------------------------------
Benchmark                         Time             CPU   Iterations
-------------------------------------------------------------------
BM_quantize_row_q4_1_ref      12845 ns        12845 ns        54677
BM_quantize_row_q4_1_avx       1360 ns         1360 ns       519134

slaren avatar Mar 26 '23 00:03 slaren

~Perplexity after this change: 6.3029 (7B q4_1)~

Full run output

./perplexity -m ./models/7B/ggml-model-q4_1.bin -f wikitext-2-raw/wiki.test.raw -t 12 main: seed = 1679789188 llama_model_load: loading model from './models/7B/ggml-model-q4_1.bin' - please wait ... llama_model_load: n_vocab = 32000 llama_model_load: n_ctx = 512 llama_model_load: n_embd = 4096 llama_model_load: n_mult = 256 llama_model_load: n_head = 32 llama_model_load: n_layer = 32 llama_model_load: n_rot = 128 llama_model_load: f16 = 3 llama_model_load: n_ff = 11008 llama_model_load: n_parts = 1 llama_model_load: type = 1 llama_model_load: ggml ctx size = 5076.59 MB llama_model_load: mem required = 6868.59 MB (+ 1026.00 MB per state) llama_model_load: loading model part 1/1 from './models/7B/ggml-model-q4_1.bin' llama_model_load: .................................... done llama_model_load: model size = 4820.52 MB / num tensors = 291 llama_init_from_file: kv self size = 256.00 MB

system_info: n_threads = 12 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | perplexity : calculating perplexity over 655 chunks 59.76 seconds per pass - ETA 10.87 hours [1]4.6106,[2]5.0788,[3]5.9975,[4]6.5975,[5]6.6904,[6]6.6597,[7]6.8588,[8]6.9568,[9]7.2826,[10]7.5468,[11]7.7747,[12]7.8253,[13]7.7676,[14]7.8484,[15]8.1075,[16]7.6907,[17]7.5501,[18]7.4949,[19]7.1007,[20]7.0787,[21]6.9834,[22]6.8058,[23]6.7749,[24]6.6862,[25]6.6797,[26]6.5124,[27]6.3226,[28]6.2171,[29]6.1203,[30]5.9528,[31]5.9206,[32]5.9367,[33]5.8739,[34]5.9077,[35]5.9296,[36]5.9729,[37]5.9745,[38]5.9824,[39]6.0189,[40]6.0655,[41]6.0861,[42]6.1289,[43]6.0895,[44]6.1479,[45]6.1487,[46]6.1206,[47]6.1384,[48]6.1157,[49]6.1142,[50]6.0721,[51]6.0641,[52]6.0524,[53]6.1027,[54]6.0832,[55]6.0599,[56]6.0930,[57]6.1117,[58]6.1306,[59]6.1457,[60]6.1885,[61]6.1751,[62]6.2314,[63]6.2605,[64]6.2724,[65]6.3186,[66]6.3298,[67]6.3479,[68]6.3603,[69]6.3840,[70]6.4122,[71]6.4332,[72]6.4649,[73]6.5236,[74]6.5269,[75]6.5418,[76]6.5553,[77]6.5675,[78]6.5542,[79]6.5846,[80]6.5776,[81]6.6003,[82]6.6076,[83]6.5525,[84]6.5364,[85]6.5249,[86]6.5017,[87]6.4426,[88]6.4180,[89]6.3960,[90]6.3812,[91]6.4069,[92]6.4003,[93]6.3985,[94]6.3942,[95]6.4252,[96]6.4245,[97]6.4218,[98]6.4135,[99]6.3976,[100]6.3935,[101]6.4180,[102]6.4125,[103]6.4327,[104]6.4431,[105]6.4418,[106]6.4598,[107]6.4601,[108]6.4743,[109]6.4664,[110]6.4615,[111]6.4824,[112]6.5048,[113]6.5083,[114]6.5039,[115]6.5092,[116]6.4988,[117]6.5055,[118]6.5340,[119]6.5562,[120]6.5922,[121]6.6074,[122]6.6315,[123]6.6712,[124]6.6899,[125]6.6808,[126]6.7218,[127]6.7585,[128]6.7905,[129]6.7743,[130]6.7853,[131]6.7815,[132]6.7721,[133]6.7581,[134]6.7678,[135]6.7641,[136]6.7522,[137]6.7453,[138]6.7293,[139]6.7193,[140]6.7153,[141]6.6870,[142]6.6843,[143]6.6561,[144]6.6345,[145]6.6274,[146]6.6161,[147]6.6203,[148]6.6217,[149]6.6186,[150]6.6148,[151]6.6188,[152]6.6069,[153]6.5908,[154]6.5822,[155]6.5901,[156]6.5849,[157]6.6010,[158]6.6043,[159]6.6100,[160]6.6120,[161]6.6245,[162]6.5950,[163]6.5826,[164]6.5577,[165]6.5254,[166]6.4974,[167]6.4577,[168]6.4266,[169]6.4141,[170]6.4023,[171]6.3752,[172]6.3569,[173]6.3402,[174]6.3098,[175]6.2891,[176]6.2767,[177]6.2560,[178]6.2328,[179]6.2153,[180]6.2052,[181]6.1829,[182]6.1645,[183]6.1500,[184]6.1489,[185]6.1412,[186]6.1421,[187]6.1483,[188]6.1446,[189]6.1635,[190]6.1652,[191]6.1868,[192]6.2028,[193]6.2202,[194]6.2317,[195]6.2534,[196]6.2696,[197]6.2906,[198]6.3066,[199]6.3100,[200]6.3153,[201]6.3098,[202]6.3299,[203]6.3383,[204]6.3390,[205]6.3506,[206]6.3581,[207]6.3548,[208]6.3644,[209]6.3688,[210]6.3730,[211]6.3837,[212]6.3928,[213]6.4027,[214]6.4066,[215]6.4093,[216]6.4233,[217]6.4425,[218]6.4568,[219]6.4577,[220]6.4536,[221]6.4471,[222]6.4452,[223]6.4342,[224]6.4271,[225]6.4234,[226]6.4442,[227]6.4537,[228]6.4598,[229]6.4659,[230]6.4627,[231]6.4787,[232]6.4669,[233]6.4494,[234]6.4334,[235]6.4167,[236]6.4103,[237]6.4003,[238]6.4026,[239]6.3865,[240]6.3748,[241]6.3770,[242]6.3800,[243]6.3775,[244]6.3658,[245]6.3624,[246]6.3510,[247]6.3379,[248]6.3295,[249]6.3259,[250]6.3301,[251]6.3234,[252]6.3190,[253]6.3097,[254]6.3042,[255]6.2931,[256]6.2743,[257]6.2611,[258]6.2515,[259]6.2487,[260]6.2395,[261]6.2342,[262]6.2286,[263]6.2225,[264]6.2029,[265]6.2027,[266]6.2019,[267]6.1946,[268]6.2032,[269]6.2024,[270]6.2017,[271]6.2097,[272]6.2133,[273]6.2134,[274]6.2153,[275]6.2246,[276]6.2307,[277]6.2459,[278]6.2562,[279]6.2648,[280]6.2675,[281]6.2778,[282]6.2834,[283]6.2984,[284]6.3061,[285]6.3141,[286]6.3268,[287]6.3259,[288]6.3326,[289]6.3232,[290]6.3067,[291]6.2908,[292]6.2755,[293]6.2622,[294]6.2644,[295]6.2637,[296]6.2690,[297]6.2683,[298]6.2719,[299]6.2692,[300]6.2579,[301]6.2574,[302]6.2499,[303]6.2405,[304]6.2314,[305]6.2283,[306]6.2155,[307]6.2175,[308]6.2203,[309]6.2037,[310]6.1974,[311]6.1913,[312]6.1939,[313]6.1880,[314]6.1866,[315]6.1705,[316]6.1662,[317]6.1495,[318]6.1284,[319]6.1410,[320]6.1533,[321]6.1573,[322]6.1529,[323]6.1458,[324]6.1428,[325]6.1538,[326]6.1538,[327]6.1556,[328]6.1588,[329]6.1647,[330]6.1678,[331]6.1801,[332]6.1770,[333]6.1847,[334]6.1791,[335]6.1724,[336]6.1758,[337]6.1731,[338]6.1724,[339]6.1668,[340]6.1625,[341]6.1702,[342]6.1732,[343]6.1779,[344]6.1778,[345]6.1777,[346]6.1743,[347]6.1782,[348]6.1819,[349]6.1841,[350]6.1811,[351]6.1819,[352]6.1818,[353]6.1753,[354]6.1764,[355]6.1816,[356]6.1851,[357]6.1819,[358]6.1914,[359]6.1938,[360]6.1904,[361]6.1899,[362]6.1967,[363]6.2078,[364]6.2142,[365]6.2191,[366]6.2206,[367]6.2292,[368]6.2263,[369]6.2275,[370]6.2296,[371]6.2241,[372]6.2291,[373]6.2340,[374]6.2320,[375]6.2318,[376]6.2388,[377]6.2338,[378]6.2364,[379]6.2424,[380]6.2345,[381]6.2309,[382]6.2264,[383]6.2257,[384]6.2252,[385]6.2242,[386]6.2241,[387]6.2246,[388]6.2207,[389]6.2153,[390]6.2088,[391]6.2012,[392]6.1970,[393]6.1955,[394]6.1984,[395]6.1970,[396]6.1894,[397]6.1965,[398]6.2007,[399]6.2081,[400]6.2074,[401]6.2086,[402]6.2097,[403]6.2119,[404]6.2185,[405]6.2100,[406]6.2072,[407]6.2068,[408]6.2089,[409]6.2210,[410]6.2320,[411]6.2440,[412]6.2604,[413]6.2715,[414]6.2798,[415]6.2853,[416]6.2934,[417]6.3062,[418]6.3097,[419]6.3173,[420]6.3268,[421]6.3386,[422]6.3426,[423]6.3496,[424]6.3605,[425]6.3698,[426]6.3766,[427]6.3812,[428]6.3896,[429]6.3951,[430]6.4031,[431]6.4174,[432]6.4216,[433]6.4206,[434]6.4158,[435]6.4169,[436]6.4193,[437]6.4295,[438]6.4376,[439]6.4342,[440]6.4329,[441]6.4279,[442]6.4259,[443]6.4269,[444]6.4272,[445]6.4251,[446]6.4276,[447]6.4308,[448]6.4350,[449]6.4326,[450]6.4330,[451]6.4288,[452]6.4174,[453]6.4091,[454]6.4034,[455]6.4041,[456]6.4093,[457]6.4114,[458]6.4091,[459]6.4099,[460]6.4188,[461]6.4158,[462]6.4146,[463]6.4190,[464]6.4176,[465]6.4150,[466]6.4073,[467]6.4084,[468]6.4087,[469]6.4110,[470]6.4122,[471]6.4075,[472]6.4128,[473]6.4073,[474]6.4090,[475]6.4030,[476]6.4050,[477]6.3981,[478]6.3975,[479]6.4043,[480]6.4091,[481]6.4106,[482]6.4061,[483]6.4018,[484]6.4039,[485]6.4023,[486]6.3960,[487]6.3958,[488]6.3938,[489]6.3888,[490]6.3865,[491]6.3837,[492]6.3779,[493]6.3749,[494]6.3730,[495]6.3730,[496]6.3694,[497]6.3640,[498]6.3623,[499]6.3575,[500]6.3478,[501]6.3413,[502]6.3414,[503]6.3408,[504]6.3317,[505]6.3338,[506]6.3346,[507]6.3289,[508]6.3250,[509]6.3244,[510]6.3283,[511]6.3331,[512]6.3367,[513]6.3385,[514]6.3456,[515]6.3401,[516]6.3393,[517]6.3400,[518]6.3394,[519]6.3428,[520]6.3452,[521]6.3468,[522]6.3495,[523]6.3503,[524]6.3561,[525]6.3596,[526]6.3609,[527]6.3625,[528]6.3572,[529]6.3580,[530]6.3525,[531]6.3508,[532]6.3558,[533]6.3583,[534]6.3568,[535]6.3593,[536]6.3540,[537]6.3515,[538]6.3567,[539]6.3577,[540]6.3613,[541]6.3616,[542]6.3618,[543]6.3638,[544]6.3648,[545]6.3628,[546]6.3637,[547]6.3594,[548]6.3542,[549]6.3538,[550]6.3508,[551]6.3470,[552]6.3444,[553]6.3407,[554]6.3382,[555]6.3349,[556]6.3347,[557]6.3374,[558]6.3336,[559]6.3333,[560]6.3330,[561]6.3337,[562]6.3311,[563]6.3309,[564]6.3358,[565]6.3379,[566]6.3379,[567]6.3359,[568]6.3363,[569]6.3346,[570]6.3374,[571]6.3379,[572]6.3383,[573]6.3377,[574]6.3339,[575]6.3335,[576]6.3333,[577]6.3318,[578]6.3293,[579]6.3297,[580]6.3232,[581]6.3195,[582]6.3186,[583]6.3193,[584]6.3193,[585]6.3115,[586]6.3044,[587]6.3051,[588]6.3098,[589]6.3154,[590]6.3185,[591]6.3205,[592]6.3193,[593]6.3155,[594]6.3164,[595]6.3139,[596]6.3174,[597]6.3151,[598]6.3125,[599]6.3149,[600]6.3147,[601]6.3135,[602]6.3156,[603]6.3181,[604]6.3189,[605]6.3228,[606]6.3249,[607]6.3236,[608]6.3199,[609]6.3203,[610]6.3241,[611]6.3224,[612]6.3250,[613]6.3212,[614]6.3164,[615]6.3086,[616]6.3114,[617]6.3050,[618]6.2999,[619]6.2940,[620]6.2796,[621]6.2725,[622]6.2708,[623]6.2723,[624]6.2730,[625]6.2730,[626]6.2722,[627]6.2746,[628]6.2748,[629]6.2742,[630]6.2773,[631]6.2831,[632]6.2890,[633]6.2872,[634]6.2909,[635]6.2914,[636]6.2878,[637]6.2844,[638]6.2874,[639]6.2841,[640]6.2853,[641]6.2855,[642]6.2922,[643]6.2940,[644]6.2952,[645]6.2937,[646]6.2982,[647]6.2944,[648]6.2957,[649]6.2957,[650]6.2997,[651]6.3050,[652]6.3060,[653]6.3104,[654]6.3037,[655]6.3029,

Please disregard this result, I was using a broken model. I am re-running the perplexity computation now.

slaren avatar Mar 26 '23 12:03 slaren

running on latest master, it starts out like this for me:

65.24 seconds per pass - ETA 11.87 hours
[1]4.4948,[2]4.9721,[3]5.8697,[4]6.4772,[5]6.6286,

your branch on my machine:

46.66 seconds per pass - ETA 8.49 hours
[1]4.5903,[2]5.0429,[3]5.9618,[4]6.5779,[5]6.6896,

system_info: n_threads = 12 / 24 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |

$ make
I llama.cpp build info:
I UNAME_S:  Linux
I UNAME_P:  x86_64
I UNAME_M:  x86_64
I CFLAGS:   -I.              -O3 -DNDEBUG -std=c11   -fPIC -pthread -mavx -mavx2 -mfma -mf16c -msse3
I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -pthread
I LDFLAGS:
I CC:       cc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
I CXX:      g++ (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0

note: i tried to match your settings ./perplexity -m ./models/7B/ggml-model-q4_1.bin -f wikitext-2-raw/wiki.test.raw -t 12

Green-Sky avatar Mar 26 '23 12:03 Green-Sky

@Green-Sky does your system-info have the same flags as mine? I wonder if there is a different path somewhere that may cause the difference. I get the same results even after rebasing to current master.

On master, my result is also different than yours:

62.33 seconds per pass - ETA 11.34 hours
[1]4.5381,[2]5.0059,[3]5.9007,

Just in case my model is broken somehow, this is the SHA256 hash:

0733914c21bc6beb432d0845f9c0abc6d12325447e64e20134c5fca72e039b79  models/7B/ggml-model-q4_1.bin

Can you verify if yours is the same?

slaren avatar Mar 26 '23 16:03 slaren

oh wow, it's different

21a45d7b56e495d3d1ec2615b779241b1285a6f8d17ba6e5d5c3db00c7d2ca2f  models/7B/ggml-model-q4_1.bin

I regenerated to double check, and same hash again.

i also checked the src

700df0d3013b703a806d2ae7f1bfb8e59814e3d06ae78be0c66368a50059f33d  models/7B/consolidated.00.pth

which matches the SHA256SUMS file

Green-Sky avatar Mar 26 '23 17:03 Green-Sky

@Green-Sky It looks like the problem was my model, after re-converting and re-quantizing the model I get the same sum and perplexity as yours. I will re-run the perplexity computation in case there is a significant difference. Thanks for checking!

slaren avatar Mar 26 '23 17:03 slaren

If I understood the results correctly, @Green-Sky shows major increase in speed with a slight decrease in accuracy? In addition to comparing cpuid flags, shouldn't you need to compare your gcc versions too since the resulting binary code can vary depending on that? I'd think that the variations caused by compiler optimizations has a much greater effect in determinism than the processor's branch predictions and whatnots?

A sidepoint related to this:

edit: -snip- as it doesn't really belong here, I made it a discussion topic:

  • #535

anzz1 avatar Mar 26 '23 17:03 anzz1

updated my previous post with system_info and make command.

shows major increase in speed with a slight decrease in accuracy?

yes, however the perplexity is very unstable in the beginning. so a full run would be necessary.

Green-Sky avatar Mar 26 '23 17:03 Green-Sky

Perplexity: 6.3056 (7B q4_1)

Full run output
make && ./perplexity -m ./models/7B/ggml-model-q4_1.bin -f wikitext-2-raw/wiki.test.raw -t 12
I llama.cpp build info:
I UNAME_S:  Linux
I UNAME_P:  x86_64
I UNAME_M:  x86_64
I CFLAGS:   -I.              -O3 -DNDEBUG -std=c11   -fPIC -pthread -mavx -mavx2 -mfma -mf16c -msse3
I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -pthread
I LDFLAGS:
I CC:       cc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
I CXX:      g++ (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0

make: Nothing to be done for 'default'.
main: seed = 1679851786
llama_model_load: loading model from './models/7B/ggml-model-q4_1.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 4096
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot   = 128
llama_model_load: f16     = 3
llama_model_load: n_ff    = 11008
llama_model_load: n_parts = 1
llama_model_load: type    = 1
llama_model_load: ggml ctx size = 5076.59 MB
llama_model_load: mem required  = 6868.59 MB (+ 1026.00 MB per state)
llama_model_load: loading model part 1/1 from './models/7B/ggml-model-q4_1.bin'
llama_model_load: .................................... done
llama_model_load: model size =  4820.52 MB / num tensors = 291
llama_init_from_file: kv self size  =  256.00 MB

system_info: n_threads = 12 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
perplexity : calculating perplexity over 655 chunks
60.09 seconds per pass - ETA 10.93 hours
[1]4.5903,[2]5.0429,[3]5.9618,[4]6.5779,[5]6.6896,[6]6.6522,[7]6.8727,[8]6.9693,[9]7.2955,[10]7.5687,[11]7.7932,[12]7.8358,[13]7.7806,[14]7.8659,[15]8.1173,[16]7.7044,[17]7.5588,[18]7.5070,[19]7.1124,[20]7.0921,[21]6.9944,[22]6.8186,[23]6.7866,[24]6.6973,[25]6.6939,[26]6.5230,[27]6.3321,[28]6.2255,[29]6.1306,[30]5.9621,[31]5.9297,[32]5.9448,[33]5.8841,[34]5.9205,[35]5.9420,[36]5.9854,[37]5.9870,[38]5.9925,[39]6.0279,[40]6.0754,[41]6.0956,[42]6.1372,[43]6.0987,[44]6.1591,[45]6.1605,[46]6.1327,[47]6.1492,[48]6.1271,[49]6.1264,[50]6.0828,[51]6.0770,[52]6.0641,[53]6.1143,[54]6.0935,[55]6.0711,[56]6.1026,[57]6.1204,[58]6.1395,[59]6.1568,[60]6.2005,[61]6.1877,[62]6.2443,[63]6.2750,[64]6.2855,[65]6.3318,[66]6.3425,[67]6.3595,[68]6.3727,[69]6.3960,[70]6.4241,[71]6.4462,[72]6.4771,[73]6.5356,[74]6.5402,[75]6.5540,[76]6.5678,[77]6.5809,[78]6.5676,[79]6.5972,[80]6.5901,[81]6.6116,[82]6.6191,[83]6.5641,[84]6.5477,[85]6.5358,[86]6.5131,[87]6.4540,[88]6.4292,[89]6.4065,[90]6.3916,[91]6.4163,[92]6.4097,[93]6.4076,[94]6.4035,[95]6.4344,[96]6.4334,[97]6.4307,[98]6.4229,[99]6.4064,[100]6.4028,[101]6.4276,[102]6.4226,[103]6.4423,[104]6.4523,[105]6.4513,[106]6.4695,[107]6.4686,[108]6.4823,[109]6.4755,[110]6.4705,[111]6.4917,[112]6.5138,[113]6.5168,[114]6.5123,[115]6.5174,[116]6.5070,[117]6.5132,[118]6.5409,[119]6.5639,[120]6.6008,[121]6.6161,[122]6.6400,[123]6.6799,[124]6.6983,[125]6.6883,[126]6.7288,[127]6.7652,[128]6.7980,[129]6.7815,[130]6.7933,[131]6.7891,[132]6.7800,[133]6.7666,[134]6.7767,[135]6.7732,[136]6.7610,[137]6.7534,[138]6.7372,[139]6.7262,[140]6.7223,[141]6.6942,[142]6.6919,[143]6.6639,[144]6.6421,[145]6.6340,[146]6.6223,[147]6.6277,[148]6.6288,[149]6.6248,[150]6.6211,[151]6.6245,[152]6.6127,[153]6.5960,[154]6.5869,[155]6.5947,[156]6.5902,[157]6.6067,[158]6.6105,[159]6.6163,[160]6.6186,[161]6.6303,[162]6.6001,[163]6.5878,[164]6.5632,[165]6.5306,[166]6.5022,[167]6.4620,[168]6.4312,[169]6.4187,[170]6.4065,[171]6.3793,[172]6.3606,[173]6.3438,[174]6.3135,[175]6.2925,[176]6.2802,[177]6.2597,[178]6.2356,[179]6.2178,[180]6.2078,[181]6.1855,[182]6.1676,[183]6.1533,[184]6.1522,[185]6.1442,[186]6.1450,[187]6.1517,[188]6.1481,[189]6.1669,[190]6.1687,[191]6.1902,[192]6.2058,[193]6.2228,[194]6.2345,[195]6.2561,[196]6.2728,[197]6.2940,[198]6.3099,[199]6.3130,[200]6.3178,[201]6.3125,[202]6.3326,[203]6.3409,[204]6.3412,[205]6.3520,[206]6.3594,[207]6.3562,[208]6.3657,[209]6.3699,[210]6.3741,[211]6.3848,[212]6.3934,[213]6.4033,[214]6.4075,[215]6.4098,[216]6.4236,[217]6.4427,[218]6.4566,[219]6.4574,[220]6.4532,[221]6.4471,[222]6.4452,[223]6.4341,[224]6.4269,[225]6.4236,[226]6.4446,[227]6.4543,[228]6.4603,[229]6.4667,[230]6.4637,[231]6.4799,[232]6.4681,[233]6.4506,[234]6.4346,[235]6.4176,[236]6.4111,[237]6.4012,[238]6.4037,[239]6.3879,[240]6.3762,[241]6.3788,[242]6.3819,[243]6.3791,[244]6.3677,[245]6.3647,[246]6.3536,[247]6.3406,[248]6.3321,[249]6.3284,[250]6.3323,[251]6.3256,[252]6.3212,[253]6.3115,[254]6.3061,[255]6.2947,[256]6.2755,[257]6.2624,[258]6.2531,[259]6.2504,[260]6.2413,[261]6.2365,[262]6.2312,[263]6.2250,[264]6.2054,[265]6.2053,[266]6.2045,[267]6.1977,[268]6.2064,[269]6.2061,[270]6.2058,[271]6.2139,[272]6.2175,[273]6.2174,[274]6.2193,[275]6.2285,[276]6.2346,[277]6.2503,[278]6.2606,[279]6.2697,[280]6.2724,[281]6.2824,[282]6.2886,[283]6.3040,[284]6.3115,[285]6.3195,[286]6.3321,[287]6.3312,[288]6.3376,[289]6.3282,[290]6.3114,[291]6.2955,[292]6.2800,[293]6.2666,[294]6.2689,[295]6.2679,[296]6.2733,[297]6.2728,[298]6.2765,[299]6.2737,[300]6.2626,[301]6.2621,[302]6.2543,[303]6.2449,[304]6.2360,[305]6.2325,[306]6.2200,[307]6.2222,[308]6.2250,[309]6.2083,[310]6.2020,[311]6.1956,[312]6.1979,[313]6.1922,[314]6.1909,[315]6.1747,[316]6.1704,[317]6.1539,[318]6.1327,[319]6.1453,[320]6.1573,[321]6.1615,[322]6.1572,[323]6.1502,[324]6.1472,[325]6.1580,[326]6.1580,[327]6.1597,[328]6.1627,[329]6.1687,[330]6.1718,[331]6.1839,[332]6.1807,[333]6.1886,[334]6.1828,[335]6.1760,[336]6.1794,[337]6.1767,[338]6.1761,[339]6.1704,[340]6.1660,[341]6.1738,[342]6.1766,[343]6.1812,[344]6.1813,[345]6.1812,[346]6.1780,[347]6.1816,[348]6.1851,[349]6.1874,[350]6.1845,[351]6.1852,[352]6.1850,[353]6.1787,[354]6.1797,[355]6.1847,[356]6.1882,[357]6.1852,[358]6.1947,[359]6.1969,[360]6.1935,[361]6.1926,[362]6.1995,[363]6.2106,[364]6.2168,[365]6.2215,[366]6.2232,[367]6.2319,[368]6.2287,[369]6.2299,[370]6.2320,[371]6.2264,[372]6.2313,[373]6.2360,[374]6.2339,[375]6.2337,[376]6.2406,[377]6.2357,[378]6.2381,[379]6.2442,[380]6.2361,[381]6.2325,[382]6.2279,[383]6.2270,[384]6.2265,[385]6.2256,[386]6.2257,[387]6.2261,[388]6.2220,[389]6.2165,[390]6.2098,[391]6.2021,[392]6.1978,[393]6.1964,[394]6.1990,[395]6.1975,[396]6.1899,[397]6.1972,[398]6.2014,[399]6.2090,[400]6.2084,[401]6.2095,[402]6.2106,[403]6.2125,[404]6.2192,[405]6.2105,[406]6.2078,[407]6.2075,[408]6.2095,[409]6.2215,[410]6.2328,[411]6.2447,[412]6.2608,[413]6.2719,[414]6.2801,[415]6.2857,[416]6.2938,[417]6.3065,[418]6.3102,[419]6.3178,[420]6.3273,[421]6.3392,[422]6.3433,[423]6.3502,[424]6.3610,[425]6.3702,[426]6.3770,[427]6.3816,[428]6.3902,[429]6.3958,[430]6.4038,[431]6.4182,[432]6.4224,[433]6.4215,[434]6.4168,[435]6.4179,[436]6.4202,[437]6.4302,[438]6.4379,[439]6.4347,[440]6.4333,[441]6.4283,[442]6.4264,[443]6.4274,[444]6.4275,[445]6.4256,[446]6.4284,[447]6.4313,[448]6.4354,[449]6.4329,[450]6.4334,[451]6.4293,[452]6.4177,[453]6.4093,[454]6.4038,[455]6.4045,[456]6.4097,[457]6.4117,[458]6.4094,[459]6.4102,[460]6.4191,[461]6.4162,[462]6.4148,[463]6.4192,[464]6.4179,[465]6.4154,[466]6.4077,[467]6.4087,[468]6.4088,[469]6.4112,[470]6.4122,[471]6.4076,[472]6.4127,[473]6.4073,[474]6.4089,[475]6.4028,[476]6.4050,[477]6.3983,[478]6.3978,[479]6.4043,[480]6.4089,[481]6.4104,[482]6.4060,[483]6.4019,[484]6.4038,[485]6.4021,[486]6.3958,[487]6.3956,[488]6.3936,[489]6.3886,[490]6.3865,[491]6.3837,[492]6.3778,[493]6.3749,[494]6.3729,[495]6.3729,[496]6.3692,[497]6.3638,[498]6.3621,[499]6.3572,[500]6.3475,[501]6.3412,[502]6.3412,[503]6.3406,[504]6.3315,[505]6.3335,[506]6.3345,[507]6.3290,[508]6.3250,[509]6.3246,[510]6.3283,[511]6.3331,[512]6.3366,[513]6.3385,[514]6.3454,[515]6.3399,[516]6.3392,[517]6.3399,[518]6.3393,[519]6.3428,[520]6.3452,[521]6.3470,[522]6.3498,[523]6.3507,[524]6.3565,[525]6.3600,[526]6.3613,[527]6.3629,[528]6.3577,[529]6.3587,[530]6.3531,[531]6.3513,[532]6.3563,[533]6.3585,[534]6.3569,[535]6.3592,[536]6.3539,[537]6.3515,[538]6.3569,[539]6.3578,[540]6.3614,[541]6.3618,[542]6.3621,[543]6.3638,[544]6.3647,[545]6.3628,[546]6.3636,[547]6.3594,[548]6.3543,[549]6.3539,[550]6.3512,[551]6.3473,[552]6.3448,[553]6.3410,[554]6.3387,[555]6.3355,[556]6.3353,[557]6.3380,[558]6.3343,[559]6.3341,[560]6.3339,[561]6.3345,[562]6.3320,[563]6.3317,[564]6.3366,[565]6.3388,[566]6.3388,[567]6.3369,[568]6.3372,[569]6.3357,[570]6.3383,[571]6.3387,[572]6.3392,[573]6.3384,[574]6.3348,[575]6.3345,[576]6.3344,[577]6.3327,[578]6.3302,[579]6.3306,[580]6.3240,[581]6.3203,[582]6.3194,[583]6.3202,[584]6.3203,[585]6.3124,[586]6.3054,[587]6.3062,[588]6.3110,[589]6.3166,[590]6.3199,[591]6.3219,[592]6.3208,[593]6.3171,[594]6.3180,[595]6.3155,[596]6.3190,[597]6.3167,[598]6.3142,[599]6.3164,[600]6.3163,[601]6.3150,[602]6.3170,[603]6.3196,[604]6.3205,[605]6.3244,[606]6.3266,[607]6.3253,[608]6.3216,[609]6.3219,[610]6.3256,[611]6.3240,[612]6.3267,[613]6.3229,[614]6.3181,[615]6.3103,[616]6.3132,[617]6.3069,[618]6.3019,[619]6.2962,[620]6.2818,[621]6.2747,[622]6.2730,[623]6.2747,[624]6.2752,[625]6.2751,[626]6.2743,[627]6.2769,[628]6.2770,[629]6.2766,[630]6.2796,[631]6.2854,[632]6.2914,[633]6.2898,[634]6.2935,[635]6.2940,[636]6.2904,[637]6.2870,[638]6.2900,[639]6.2867,[640]6.2879,[641]6.2880,[642]6.2946,[643]6.2966,[644]6.2980,[645]6.2963,[646]6.3009,[647]6.2971,[648]6.2984,[649]6.2985,[650]6.3024,[651]6.3077,[652]6.3087,[653]6.3129,[654]6.3063,[655]6.3056,

slaren avatar Mar 27 '23 10:03 slaren

Rebased to master.

slaren avatar Mar 28 '23 17:03 slaren

The bot almost got it right, the purpose of using the reference implementation in ggml_quantize_q4_1 is to ensure the accuracy when quantizing the model.

slaren avatar Mar 28 '23 17:03 slaren

Ironically, after the changes to master I am seeing slightly lower perplexity with the AVX path in the first chunks.

master: [1]4.5870,[2]5.0477,[3]5.9136,[4]6.5310,[5]6.6497,

avx2: [1]4.5671,[2]5.0153,[3]5.8921,[4]6.4689,[5]6.5678,

🤷‍♂️

slaren avatar Mar 28 '23 18:03 slaren

I guess we must be doing something right 🦙

ggerganov avatar Mar 28 '23 18:03 ggerganov