llama.cpp
llama.cpp copied to clipboard
Add SIMD implementation of ggml_compute_forward_rms_norm_f32
Using the GGML SIMD macros so hopefully it should work on different architectures, but only tested with AVX 2.
Don't expect any meaningful performance improvement, the function is not very hot.
Perplexity after this change (7B, q4_0): 6.5980
Full run output
./main -m ./models/7B/ggml-model-q4_0.bin --perplexity -t 12 -f wikitext-2-raw/wiki.test.raw main: seed = 1679622068 llama_model_load: loading model from './models/7B/ggml-model-q4_0.bin' - please wait ... llama_model_load: n_vocab = 32000 llama_model_load: n_ctx = 512 llama_model_load: n_embd = 4096 llama_model_load: n_mult = 256 llama_model_load: n_head = 32 llama_model_load: n_layer = 32 llama_model_load: n_rot = 128 llama_model_load: f16 = 2 llama_model_load: n_ff = 11008 llama_model_load: n_parts = 1 llama_model_load: ggml ctx size = 4529.34 MB llama_model_load: memory_size = 512.00 MB, n_mem = 16384 llama_model_load: loading model part 1/1 from './models/7B/ggml-model-q4_0.bin' llama_model_load: .................................... done llama_model_load: model size = 4017.27 MB / num tensors = 291
system_info: n_threads = 12 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | perplexity : calculating perplexity over 655 chunks 44.42 seconds per pass - ETA 8.08 hours [1]4.7114,[2]5.1906,[3]6.0631,[4]6.7550,[5]6.8446,[6]6.8386,[7]7.0409,[8]7.1527,[9]7.5598,[10]7.7969,[11]8.0295,[12]8.0414,[13]7.9639,[14]8.0505,[15]8.3085,[16]7.8944,[17]7.7598,[18]7.7382,[19]7.3351,[20]7.3199,[21]7.2169,[22]7.0259,[23]6.9849,[24]6.8973,[25]6.9077,[26]6.7192,[27]6.5322,[28]6.4300,[29]6.3396,[30]6.1645,[31]6.1348,[32]6.1517,[33]6.0782,[34]6.1209,[35]6.1473,[36]6.1898,[37]6.1899,[38]6.2031,[39]6.2446,[40]6.3025,[41]6.3138,[42]6.3502,[43]6.3025,[44]6.3605,[45]6.3612,[46]6.3311,[47]6.3573,[48]6.3292,[49]6.3372,[50]6.2981,[51]6.2917,[52]6.2815,[53]6.3305,[54]6.3116,[55]6.2910,[56]6.3316,[57]6.3587,[58]6.3818,[59]6.3926,[60]6.4389,[61]6.4274,[62]6.4922,[63]6.5292,[64]6.5446,[65]6.5959,[66]6.6112,[67]6.6276,[68]6.6449,[69]6.6738,[70]6.7064,[71]6.7333,[72]6.7657,[73]6.8332,[74]6.8364,[75]6.8536,[76]6.8687,[77]6.8844,[78]6.8698,[79]6.8997,[80]6.8911,[81]6.9051,[82]6.9124,[83]6.8562,[84]6.8423,[85]6.8377,[86]6.8144,[87]6.7518,[88]6.7245,[89]6.7046,[90]6.6873,[91]6.7184,[92]6.7128,[93]6.7157,[94]6.7149,[95]6.7477,[96]6.7466,[97]6.7416,[98]6.7350,[99]6.7177,[100]6.7154,[101]6.7402,[102]6.7337,[103]6.7586,[104]6.7642,[105]6.7633,[106]6.7790,[107]6.7792,[108]6.7870,[109]6.7803,[110]6.7725,[111]6.7956,[112]6.8164,[113]6.8175,[114]6.8138,[115]6.8237,[116]6.8156,[117]6.8208,[118]6.8504,[119]6.8723,[120]6.9123,[121]6.9303,[122]6.9570,[123]6.9971,[124]7.0147,[125]7.0044,[126]7.0463,[127]7.0859,[128]7.1153,[129]7.0964,[130]7.1063,[131]7.1002,[132]7.0924,[133]7.0800,[134]7.0907,[135]7.0879,[136]7.0736,[137]7.0657,[138]7.0504,[139]7.0392,[140]7.0361,[141]7.0095,[142]7.0062,[143]6.9775,[144]6.9563,[145]6.9482,[146]6.9339,[147]6.9402,[148]6.9417,[149]6.9379,[150]6.9329,[151]6.9353,[152]6.9261,[153]6.9078,[154]6.8983,[155]6.9044,[156]6.9000,[157]6.9182,[158]6.9210,[159]6.9267,[160]6.9309,[161]6.9435,[162]6.9103,[163]6.8968,[164]6.8695,[165]6.8349,[166]6.8041,[167]6.7628,[168]6.7285,[169]6.7135,[170]6.7002,[171]6.6698,[172]6.6515,[173]6.6329,[174]6.5997,[175]6.5754,[176]6.5626,[177]6.5421,[178]6.5181,[179]6.5002,[180]6.4903,[181]6.4657,[182]6.4461,[183]6.4302,[184]6.4303,[185]6.4221,[186]6.4243,[187]6.4300,[188]6.4266,[189]6.4453,[190]6.4473,[191]6.4681,[192]6.4844,[193]6.5029,[194]6.5150,[195]6.5373,[196]6.5548,[197]6.5795,[198]6.5967,[199]6.5986,[200]6.6019,[201]6.5977,[202]6.6201,[203]6.6265,[204]6.6283,[205]6.6391,[206]6.6467,[207]6.6421,[208]6.6507,[209]6.6569,[210]6.6620,[211]6.6745,[212]6.6828,[213]6.6932,[214]6.6982,[215]6.7015,[216]6.7164,[217]6.7344,[218]6.7478,[219]6.7485,[220]6.7446,[221]6.7388,[222]6.7352,[223]6.7233,[224]6.7170,[225]6.7116,[226]6.7334,[227]6.7449,[228]6.7513,[229]6.7583,[230]6.7549,[231]6.7718,[232]6.7580,[233]6.7394,[234]6.7227,[235]6.7078,[236]6.6991,[237]6.6887,[238]6.6921,[239]6.6745,[240]6.6629,[241]6.6677,[242]6.6716,[243]6.6695,[244]6.6565,[245]6.6527,[246]6.6399,[247]6.6273,[248]6.6190,[249]6.6171,[250]6.6223,[251]6.6147,[252]6.6105,[253]6.6001,[254]6.5960,[255]6.5826,[256]6.5625,[257]6.5502,[258]6.5418,[259]6.5396,[260]6.5309,[261]6.5267,[262]6.5210,[263]6.5152,[264]6.4968,[265]6.4964,[266]6.4952,[267]6.4883,[268]6.5001,[269]6.4982,[270]6.4991,[271]6.5069,[272]6.5119,[273]6.5104,[274]6.5121,[275]6.5214,[276]6.5264,[277]6.5448,[278]6.5559,[279]6.5643,[280]6.5684,[281]6.5794,[282]6.5856,[283]6.6008,[284]6.6084,[285]6.6175,[286]6.6323,[287]6.6317,[288]6.6380,[289]6.6283,[290]6.6129,[291]6.5968,[292]6.5802,[293]6.5651,[294]6.5679,[295]6.5664,[296]6.5702,[297]6.5689,[298]6.5720,[299]6.5691,[300]6.5572,[301]6.5569,[302]6.5489,[303]6.5398,[304]6.5298,[305]6.5279,[306]6.5141,[307]6.5170,[308]6.5205,[309]6.5038,[310]6.4971,[311]6.4907,[312]6.4925,[313]6.4870,[314]6.4867,[315]6.4689,[316]6.4651,[317]6.4476,[318]6.4244,[319]6.4370,[320]6.4510,[321]6.4545,[322]6.4497,[323]6.4427,[324]6.4403,[325]6.4499,[326]6.4502,[327]6.4524,[328]6.4572,[329]6.4636,[330]6.4661,[331]6.4793,[332]6.4760,[333]6.4837,[334]6.4771,[335]6.4699,[336]6.4734,[337]6.4695,[338]6.4683,[339]6.4627,[340]6.4580,[341]6.4661,[342]6.4688,[343]6.4744,[344]6.4746,[345]6.4746,[346]6.4714,[347]6.4760,[348]6.4803,[349]6.4819,[350]6.4784,[351]6.4788,[352]6.4786,[353]6.4732,[354]6.4738,[355]6.4794,[356]6.4822,[357]6.4782,[358]6.4877,[359]6.4910,[360]6.4871,[361]6.4870,[362]6.4944,[363]6.5066,[364]6.5130,[365]6.5190,[366]6.5199,[367]6.5286,[368]6.5261,[369]6.5274,[370]6.5282,[371]6.5219,[372]6.5270,[373]6.5327,[374]6.5307,[375]6.5295,[376]6.5383,[377]6.5327,[378]6.5349,[379]6.5415,[380]6.5315,[381]6.5270,[382]6.5203,[383]6.5184,[384]6.5177,[385]6.5165,[386]6.5160,[387]6.5152,[388]6.5104,[389]6.5041,[390]6.4966,[391]6.4883,[392]6.4841,[393]6.4827,[394]6.4852,[395]6.4835,[396]6.4754,[397]6.4834,[398]6.4876,[399]6.4972,[400]6.4971,[401]6.4985,[402]6.4995,[403]6.5011,[404]6.5080,[405]6.4991,[406]6.4954,[407]6.4952,[408]6.4959,[409]6.5087,[410]6.5207,[411]6.5333,[412]6.5502,[413]6.5618,[414]6.5694,[415]6.5756,[416]6.5837,[417]6.5978,[418]6.6021,[419]6.6099,[420]6.6196,[421]6.6315,[422]6.6367,[423]6.6440,[424]6.6563,[425]6.6661,[426]6.6733,[427]6.6780,[428]6.6866,[429]6.6908,[430]6.7000,[431]6.7147,[432]6.7190,[433]6.7174,[434]6.7120,[435]6.7125,[436]6.7151,[437]6.7250,[438]6.7328,[439]6.7290,[440]6.7280,[441]6.7224,[442]6.7208,[443]6.7218,[444]6.7228,[445]6.7206,[446]6.7228,[447]6.7260,[448]6.7301,[449]6.7274,[450]6.7279,[451]6.7233,[452]6.7121,[453]6.7037,[454]6.6979,[455]6.6985,[456]6.7033,[457]6.7055,[458]6.7031,[459]6.7040,[460]6.7135,[461]6.7108,[462]6.7094,[463]6.7146,[464]6.7136,[465]6.7108,[466]6.7027,[467]6.7036,[468]6.7038,[469]6.7060,[470]6.7068,[471]6.7016,[472]6.7068,[473]6.7008,[474]6.7029,[475]6.6973,[476]6.6996,[477]6.6927,[478]6.6921,[479]6.6996,[480]6.7051,[481]6.7071,[482]6.7030,[483]6.6990,[484]6.7020,[485]6.7000,[486]6.6941,[487]6.6946,[488]6.6929,[489]6.6875,[490]6.6843,[491]6.6814,[492]6.6756,[493]6.6726,[494]6.6708,[495]6.6707,[496]6.6675,[497]6.6622,[498]6.6607,[499]6.6553,[500]6.6454,[501]6.6386,[502]6.6379,[503]6.6377,[504]6.6279,[505]6.6308,[506]6.6315,[507]6.6257,[508]6.6216,[509]6.6201,[510]6.6240,[511]6.6291,[512]6.6326,[513]6.6340,[514]6.6411,[515]6.6351,[516]6.6339,[517]6.6345,[518]6.6341,[519]6.6370,[520]6.6400,[521]6.6418,[522]6.6447,[523]6.6456,[524]6.6526,[525]6.6567,[526]6.6575,[527]6.6597,[528]6.6540,[529]6.6548,[530]6.6494,[531]6.6476,[532]6.6525,[533]6.6549,[534]6.6524,[535]6.6553,[536]6.6498,[537]6.6471,[538]6.6526,[539]6.6534,[540]6.6576,[541]6.6587,[542]6.6596,[543]6.6612,[544]6.6628,[545]6.6606,[546]6.6614,[547]6.6565,[548]6.6504,[549]6.6504,[550]6.6469,[551]6.6433,[552]6.6412,[553]6.6366,[554]6.6341,[555]6.6310,[556]6.6309,[557]6.6336,[558]6.6299,[559]6.6293,[560]6.6289,[561]6.6291,[562]6.6272,[563]6.6273,[564]6.6315,[565]6.6336,[566]6.6331,[567]6.6310,[568]6.6312,[569]6.6295,[570]6.6322,[571]6.6327,[572]6.6333,[573]6.6334,[574]6.6299,[575]6.6300,[576]6.6299,[577]6.6288,[578]6.6262,[579]6.6271,[580]6.6203,[581]6.6161,[582]6.6151,[583]6.6154,[584]6.6156,[585]6.6077,[586]6.6004,[587]6.6010,[588]6.6060,[589]6.6120,[590]6.6149,[591]6.6167,[592]6.6153,[593]6.6115,[594]6.6123,[595]6.6099,[596]6.6143,[597]6.6116,[598]6.6080,[599]6.6104,[600]6.6103,[601]6.6091,[602]6.6112,[603]6.6140,[604]6.6152,[605]6.6187,[606]6.6208,[607]6.6196,[608]6.6157,[609]6.6167,[610]6.6204,[611]6.6183,[612]6.6212,[613]6.6174,[614]6.6119,[615]6.6039,[616]6.6071,[617]6.6007,[618]6.5950,[619]6.5888,[620]6.5736,[621]6.5660,[622]6.5641,[623]6.5657,[624]6.5661,[625]6.5661,[626]6.5648,[627]6.5672,[628]6.5677,[629]6.5671,[630]6.5709,[631]6.5772,[632]6.5829,[633]6.5810,[634]6.5845,[635]6.5851,[636]6.5819,[637]6.5787,[638]6.5816,[639]6.5786,[640]6.5796,[641]6.5797,[642]6.5868,[643]6.5890,[644]6.5903,[645]6.5879,[646]6.5922,[647]6.5892,[648]6.5903,[649]6.5902,[650]6.5939,[651]6.5999,[652]6.6006,[653]6.6051,[654]6.5984,[655]6.5980,
I think if the performance does not change it is not worth making the code too cumbersome. Will think about this some more and maybe merge at a later stage
Closing for now since it doesn't look like this is going to be useful any time soon, there are many other ops that would be more important to optimize than this.