llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

Add SIMD implementation of ggml_compute_forward_rms_norm_f32

Open slaren opened this issue 1 year ago • 1 comments

Using the GGML SIMD macros so hopefully it should work on different architectures, but only tested with AVX 2.

Don't expect any meaningful performance improvement, the function is not very hot.

Perplexity after this change (7B, q4_0): 6.5980

Full run output

./main -m ./models/7B/ggml-model-q4_0.bin --perplexity -t 12 -f wikitext-2-raw/wiki.test.raw main: seed = 1679622068 llama_model_load: loading model from './models/7B/ggml-model-q4_0.bin' - please wait ... llama_model_load: n_vocab = 32000 llama_model_load: n_ctx = 512 llama_model_load: n_embd = 4096 llama_model_load: n_mult = 256 llama_model_load: n_head = 32 llama_model_load: n_layer = 32 llama_model_load: n_rot = 128 llama_model_load: f16 = 2 llama_model_load: n_ff = 11008 llama_model_load: n_parts = 1 llama_model_load: ggml ctx size = 4529.34 MB llama_model_load: memory_size = 512.00 MB, n_mem = 16384 llama_model_load: loading model part 1/1 from './models/7B/ggml-model-q4_0.bin' llama_model_load: .................................... done llama_model_load: model size = 4017.27 MB / num tensors = 291

system_info: n_threads = 12 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | perplexity : calculating perplexity over 655 chunks 44.42 seconds per pass - ETA 8.08 hours [1]4.7114,[2]5.1906,[3]6.0631,[4]6.7550,[5]6.8446,[6]6.8386,[7]7.0409,[8]7.1527,[9]7.5598,[10]7.7969,[11]8.0295,[12]8.0414,[13]7.9639,[14]8.0505,[15]8.3085,[16]7.8944,[17]7.7598,[18]7.7382,[19]7.3351,[20]7.3199,[21]7.2169,[22]7.0259,[23]6.9849,[24]6.8973,[25]6.9077,[26]6.7192,[27]6.5322,[28]6.4300,[29]6.3396,[30]6.1645,[31]6.1348,[32]6.1517,[33]6.0782,[34]6.1209,[35]6.1473,[36]6.1898,[37]6.1899,[38]6.2031,[39]6.2446,[40]6.3025,[41]6.3138,[42]6.3502,[43]6.3025,[44]6.3605,[45]6.3612,[46]6.3311,[47]6.3573,[48]6.3292,[49]6.3372,[50]6.2981,[51]6.2917,[52]6.2815,[53]6.3305,[54]6.3116,[55]6.2910,[56]6.3316,[57]6.3587,[58]6.3818,[59]6.3926,[60]6.4389,[61]6.4274,[62]6.4922,[63]6.5292,[64]6.5446,[65]6.5959,[66]6.6112,[67]6.6276,[68]6.6449,[69]6.6738,[70]6.7064,[71]6.7333,[72]6.7657,[73]6.8332,[74]6.8364,[75]6.8536,[76]6.8687,[77]6.8844,[78]6.8698,[79]6.8997,[80]6.8911,[81]6.9051,[82]6.9124,[83]6.8562,[84]6.8423,[85]6.8377,[86]6.8144,[87]6.7518,[88]6.7245,[89]6.7046,[90]6.6873,[91]6.7184,[92]6.7128,[93]6.7157,[94]6.7149,[95]6.7477,[96]6.7466,[97]6.7416,[98]6.7350,[99]6.7177,[100]6.7154,[101]6.7402,[102]6.7337,[103]6.7586,[104]6.7642,[105]6.7633,[106]6.7790,[107]6.7792,[108]6.7870,[109]6.7803,[110]6.7725,[111]6.7956,[112]6.8164,[113]6.8175,[114]6.8138,[115]6.8237,[116]6.8156,[117]6.8208,[118]6.8504,[119]6.8723,[120]6.9123,[121]6.9303,[122]6.9570,[123]6.9971,[124]7.0147,[125]7.0044,[126]7.0463,[127]7.0859,[128]7.1153,[129]7.0964,[130]7.1063,[131]7.1002,[132]7.0924,[133]7.0800,[134]7.0907,[135]7.0879,[136]7.0736,[137]7.0657,[138]7.0504,[139]7.0392,[140]7.0361,[141]7.0095,[142]7.0062,[143]6.9775,[144]6.9563,[145]6.9482,[146]6.9339,[147]6.9402,[148]6.9417,[149]6.9379,[150]6.9329,[151]6.9353,[152]6.9261,[153]6.9078,[154]6.8983,[155]6.9044,[156]6.9000,[157]6.9182,[158]6.9210,[159]6.9267,[160]6.9309,[161]6.9435,[162]6.9103,[163]6.8968,[164]6.8695,[165]6.8349,[166]6.8041,[167]6.7628,[168]6.7285,[169]6.7135,[170]6.7002,[171]6.6698,[172]6.6515,[173]6.6329,[174]6.5997,[175]6.5754,[176]6.5626,[177]6.5421,[178]6.5181,[179]6.5002,[180]6.4903,[181]6.4657,[182]6.4461,[183]6.4302,[184]6.4303,[185]6.4221,[186]6.4243,[187]6.4300,[188]6.4266,[189]6.4453,[190]6.4473,[191]6.4681,[192]6.4844,[193]6.5029,[194]6.5150,[195]6.5373,[196]6.5548,[197]6.5795,[198]6.5967,[199]6.5986,[200]6.6019,[201]6.5977,[202]6.6201,[203]6.6265,[204]6.6283,[205]6.6391,[206]6.6467,[207]6.6421,[208]6.6507,[209]6.6569,[210]6.6620,[211]6.6745,[212]6.6828,[213]6.6932,[214]6.6982,[215]6.7015,[216]6.7164,[217]6.7344,[218]6.7478,[219]6.7485,[220]6.7446,[221]6.7388,[222]6.7352,[223]6.7233,[224]6.7170,[225]6.7116,[226]6.7334,[227]6.7449,[228]6.7513,[229]6.7583,[230]6.7549,[231]6.7718,[232]6.7580,[233]6.7394,[234]6.7227,[235]6.7078,[236]6.6991,[237]6.6887,[238]6.6921,[239]6.6745,[240]6.6629,[241]6.6677,[242]6.6716,[243]6.6695,[244]6.6565,[245]6.6527,[246]6.6399,[247]6.6273,[248]6.6190,[249]6.6171,[250]6.6223,[251]6.6147,[252]6.6105,[253]6.6001,[254]6.5960,[255]6.5826,[256]6.5625,[257]6.5502,[258]6.5418,[259]6.5396,[260]6.5309,[261]6.5267,[262]6.5210,[263]6.5152,[264]6.4968,[265]6.4964,[266]6.4952,[267]6.4883,[268]6.5001,[269]6.4982,[270]6.4991,[271]6.5069,[272]6.5119,[273]6.5104,[274]6.5121,[275]6.5214,[276]6.5264,[277]6.5448,[278]6.5559,[279]6.5643,[280]6.5684,[281]6.5794,[282]6.5856,[283]6.6008,[284]6.6084,[285]6.6175,[286]6.6323,[287]6.6317,[288]6.6380,[289]6.6283,[290]6.6129,[291]6.5968,[292]6.5802,[293]6.5651,[294]6.5679,[295]6.5664,[296]6.5702,[297]6.5689,[298]6.5720,[299]6.5691,[300]6.5572,[301]6.5569,[302]6.5489,[303]6.5398,[304]6.5298,[305]6.5279,[306]6.5141,[307]6.5170,[308]6.5205,[309]6.5038,[310]6.4971,[311]6.4907,[312]6.4925,[313]6.4870,[314]6.4867,[315]6.4689,[316]6.4651,[317]6.4476,[318]6.4244,[319]6.4370,[320]6.4510,[321]6.4545,[322]6.4497,[323]6.4427,[324]6.4403,[325]6.4499,[326]6.4502,[327]6.4524,[328]6.4572,[329]6.4636,[330]6.4661,[331]6.4793,[332]6.4760,[333]6.4837,[334]6.4771,[335]6.4699,[336]6.4734,[337]6.4695,[338]6.4683,[339]6.4627,[340]6.4580,[341]6.4661,[342]6.4688,[343]6.4744,[344]6.4746,[345]6.4746,[346]6.4714,[347]6.4760,[348]6.4803,[349]6.4819,[350]6.4784,[351]6.4788,[352]6.4786,[353]6.4732,[354]6.4738,[355]6.4794,[356]6.4822,[357]6.4782,[358]6.4877,[359]6.4910,[360]6.4871,[361]6.4870,[362]6.4944,[363]6.5066,[364]6.5130,[365]6.5190,[366]6.5199,[367]6.5286,[368]6.5261,[369]6.5274,[370]6.5282,[371]6.5219,[372]6.5270,[373]6.5327,[374]6.5307,[375]6.5295,[376]6.5383,[377]6.5327,[378]6.5349,[379]6.5415,[380]6.5315,[381]6.5270,[382]6.5203,[383]6.5184,[384]6.5177,[385]6.5165,[386]6.5160,[387]6.5152,[388]6.5104,[389]6.5041,[390]6.4966,[391]6.4883,[392]6.4841,[393]6.4827,[394]6.4852,[395]6.4835,[396]6.4754,[397]6.4834,[398]6.4876,[399]6.4972,[400]6.4971,[401]6.4985,[402]6.4995,[403]6.5011,[404]6.5080,[405]6.4991,[406]6.4954,[407]6.4952,[408]6.4959,[409]6.5087,[410]6.5207,[411]6.5333,[412]6.5502,[413]6.5618,[414]6.5694,[415]6.5756,[416]6.5837,[417]6.5978,[418]6.6021,[419]6.6099,[420]6.6196,[421]6.6315,[422]6.6367,[423]6.6440,[424]6.6563,[425]6.6661,[426]6.6733,[427]6.6780,[428]6.6866,[429]6.6908,[430]6.7000,[431]6.7147,[432]6.7190,[433]6.7174,[434]6.7120,[435]6.7125,[436]6.7151,[437]6.7250,[438]6.7328,[439]6.7290,[440]6.7280,[441]6.7224,[442]6.7208,[443]6.7218,[444]6.7228,[445]6.7206,[446]6.7228,[447]6.7260,[448]6.7301,[449]6.7274,[450]6.7279,[451]6.7233,[452]6.7121,[453]6.7037,[454]6.6979,[455]6.6985,[456]6.7033,[457]6.7055,[458]6.7031,[459]6.7040,[460]6.7135,[461]6.7108,[462]6.7094,[463]6.7146,[464]6.7136,[465]6.7108,[466]6.7027,[467]6.7036,[468]6.7038,[469]6.7060,[470]6.7068,[471]6.7016,[472]6.7068,[473]6.7008,[474]6.7029,[475]6.6973,[476]6.6996,[477]6.6927,[478]6.6921,[479]6.6996,[480]6.7051,[481]6.7071,[482]6.7030,[483]6.6990,[484]6.7020,[485]6.7000,[486]6.6941,[487]6.6946,[488]6.6929,[489]6.6875,[490]6.6843,[491]6.6814,[492]6.6756,[493]6.6726,[494]6.6708,[495]6.6707,[496]6.6675,[497]6.6622,[498]6.6607,[499]6.6553,[500]6.6454,[501]6.6386,[502]6.6379,[503]6.6377,[504]6.6279,[505]6.6308,[506]6.6315,[507]6.6257,[508]6.6216,[509]6.6201,[510]6.6240,[511]6.6291,[512]6.6326,[513]6.6340,[514]6.6411,[515]6.6351,[516]6.6339,[517]6.6345,[518]6.6341,[519]6.6370,[520]6.6400,[521]6.6418,[522]6.6447,[523]6.6456,[524]6.6526,[525]6.6567,[526]6.6575,[527]6.6597,[528]6.6540,[529]6.6548,[530]6.6494,[531]6.6476,[532]6.6525,[533]6.6549,[534]6.6524,[535]6.6553,[536]6.6498,[537]6.6471,[538]6.6526,[539]6.6534,[540]6.6576,[541]6.6587,[542]6.6596,[543]6.6612,[544]6.6628,[545]6.6606,[546]6.6614,[547]6.6565,[548]6.6504,[549]6.6504,[550]6.6469,[551]6.6433,[552]6.6412,[553]6.6366,[554]6.6341,[555]6.6310,[556]6.6309,[557]6.6336,[558]6.6299,[559]6.6293,[560]6.6289,[561]6.6291,[562]6.6272,[563]6.6273,[564]6.6315,[565]6.6336,[566]6.6331,[567]6.6310,[568]6.6312,[569]6.6295,[570]6.6322,[571]6.6327,[572]6.6333,[573]6.6334,[574]6.6299,[575]6.6300,[576]6.6299,[577]6.6288,[578]6.6262,[579]6.6271,[580]6.6203,[581]6.6161,[582]6.6151,[583]6.6154,[584]6.6156,[585]6.6077,[586]6.6004,[587]6.6010,[588]6.6060,[589]6.6120,[590]6.6149,[591]6.6167,[592]6.6153,[593]6.6115,[594]6.6123,[595]6.6099,[596]6.6143,[597]6.6116,[598]6.6080,[599]6.6104,[600]6.6103,[601]6.6091,[602]6.6112,[603]6.6140,[604]6.6152,[605]6.6187,[606]6.6208,[607]6.6196,[608]6.6157,[609]6.6167,[610]6.6204,[611]6.6183,[612]6.6212,[613]6.6174,[614]6.6119,[615]6.6039,[616]6.6071,[617]6.6007,[618]6.5950,[619]6.5888,[620]6.5736,[621]6.5660,[622]6.5641,[623]6.5657,[624]6.5661,[625]6.5661,[626]6.5648,[627]6.5672,[628]6.5677,[629]6.5671,[630]6.5709,[631]6.5772,[632]6.5829,[633]6.5810,[634]6.5845,[635]6.5851,[636]6.5819,[637]6.5787,[638]6.5816,[639]6.5786,[640]6.5796,[641]6.5797,[642]6.5868,[643]6.5890,[644]6.5903,[645]6.5879,[646]6.5922,[647]6.5892,[648]6.5903,[649]6.5902,[650]6.5939,[651]6.5999,[652]6.6006,[653]6.6051,[654]6.5984,[655]6.5980,

slaren avatar Mar 24 '23 01:03 slaren

I think if the performance does not change it is not worth making the code too cumbersome. Will think about this some more and maybe merge at a later stage

ggerganov avatar Mar 24 '23 15:03 ggerganov

Closing for now since it doesn't look like this is going to be useful any time soon, there are many other ops that would be more important to optimize than this.

slaren avatar Apr 29 '23 16:04 slaren