llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

ggml : add Q5_0 and Q5_1 quantization

Open ggerganov opened this issue 1 year ago • 0 comments

Follow up on the idea by @ikawrakow in https://github.com/ggerganov/llama.cpp/pull/729#issuecomment-1521825435

Q5_0

#define QK5_0 32
typedef struct {
    ggml_fp16_t d;          // delta
    uint8_t qh[4];          // 5-th bit of quants (uint32_t)
    uint8_t qs[QK5_0 / 2];  // nibbles / quants
} block_q5_0;

On M1 Pro, it evaluates at about 53 ms / token for 7B model This format is bigger than Q4_0 and Q4_2.

Perplexity for 7B: 6.0139

main: seed = 1682523351
llama.cpp: loading model from ../models/7B/ggml-model-q5_0.bin
llama_model_load_internal: format     = ggjt v1 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 8 (mostly Q5_0)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 4525003.11 KB
llama_model_load_internal: mem required  = 6210.95 MB (+ 1026.00 MB per state)
....................................................................................................
llama_init_from_file: kv self size  =  256.00 MB

system_info: n_threads = 12 / 64 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 | 
perplexity : calculating perplexity over 655 chunks, batch_size=512
1.90 seconds per pass - ETA 20 minutes
[1]4.2484,[2]4.7547,[3]5.6316,[4]6.2345,[5]6.3575,[6]6.3361,[7]6.5288,[8]6.6259,[9]6.9688,[10]7.2116,[11]7.4185,[12]7.4477,[13]7.3615,[14]7.4028,[15]7.6442,[16]7.2662,[17]7.1544,[18]7.1013,[19]6.7447,[20]6.7341,[21]6.6423,[22]6.4727,[23]6.4417,[24]6.3490,[25]6.3524,[26]6.1948,[27]6.0225,[28]5.9267,[29]5.8384,[30]5.6840,[31]5.6553,[32]5.6751,[33]5.6167,[34]5.6509,[35]5.6727,[36]5.7108,[37]5.7162,[38]5.7249,[39]5.7569,[40]5.8063,[41]5.8181,[42]5.8563,[43]5.8172,[44]5.8757,[45]5.8774,[46]5.8511,[47]5.8708,[48]5.8464,[49]5.8466,[50]5.8085,[51]5.8044,[52]5.7966,[53]5.8401,[54]5.8246,[55]5.8020,[56]5.8319,[57]5.8511,[58]5.8723,[59]5.8888,[60]5.9307,[61]5.9239,[62]5.9817,[63]6.0119,[64]6.0250,[65]6.0682,[66]6.0762,[67]6.0930,[68]6.1060,[69]6.1291,[70]6.1593,[71]6.1827,[72]6.2131,[73]6.2709,[74]6.2743,[75]6.2872,[76]6.2990,[77]6.3104,[78]6.2966,[79]6.3240,[80]6.3168,[81]6.3290,[82]6.3332,[83]6.2818,[84]6.2637,[85]6.2520,[86]6.2305,[87]6.1659,[88]6.1399,[89]6.1208,[90]6.1060,[91]6.1289,[92]6.1233,[93]6.1227,[94]6.1208,[95]6.1480,[96]6.1481,[97]6.1437,[98]6.1379,[99]6.1246,[100]6.1235,[101]6.1472,[102]6.1415,[103]6.1609,[104]6.1672,[105]6.1674,[106]6.1840,[107]6.1833,[108]6.1953,[109]6.1898,[110]6.1861,[111]6.2078,[112]6.2281,[113]6.2300,[114]6.2262,[115]6.2329,[116]6.2228,[117]6.2279,[118]6.2566,[119]6.2776,[120]6.3119,[121]6.3264,[122]6.3510,[123]6.3874,[124]6.4045,[125]6.3951,[126]6.4335,[127]6.4703,[128]6.5000,[129]6.4848,[130]6.4939,[131]6.4899,[132]6.4836,[133]6.4700,[134]6.4798,[135]6.4761,[136]6.4651,[137]6.4581,[138]6.4401,[139]6.4302,[140]6.4270,[141]6.3973,[142]6.3939,[143]6.3640,[144]6.3438,[145]6.3341,[146]6.3221,[147]6.3254,[148]6.3252,[149]6.3196,[150]6.3152,[151]6.3173,[152]6.3077,[153]6.2910,[154]6.2824,[155]6.2890,[156]6.2838,[157]6.3001,[158]6.3039,[159]6.3088,[160]6.3113,[161]6.3228,[162]6.2941,[163]6.2831,[164]6.2598,[165]6.2292,[166]6.2024,[167]6.1659,[168]6.1355,[169]6.1222,[170]6.1113,[171]6.0850,[172]6.0680,[173]6.0515,[174]6.0219,[175]6.0007,[176]5.9895,[177]5.9700,[178]5.9476,[179]5.9303,[180]5.9207,[181]5.8998,[182]5.8821,[183]5.8682,[184]5.8678,[185]5.8605,[186]5.8607,[187]5.8668,[188]5.8631,[189]5.8800,[190]5.8808,[191]5.9013,[192]5.9171,[193]5.9332,[194]5.9440,[195]5.9652,[196]5.9808,[197]6.0014,[198]6.0161,[199]6.0190,[200]6.0240,[201]6.0190,[202]6.0373,[203]6.0446,[204]6.0430,[205]6.0534,[206]6.0602,[207]6.0560,[208]6.0648,[209]6.0689,[210]6.0739,[211]6.0842,[212]6.0916,[213]6.1022,[214]6.1043,[215]6.1072,[216]6.1210,[217]6.1388,[218]6.1515,[219]6.1514,[220]6.1479,[221]6.1431,[222]6.1408,[223]6.1310,[224]6.1242,[225]6.1201,[226]6.1407,[227]6.1492,[228]6.1545,[229]6.1608,[230]6.1582,[231]6.1744,[232]6.1626,[233]6.1464,[234]6.1317,[235]6.1126,[236]6.1058,[237]6.0962,[238]6.0987,[239]6.0844,[240]6.0742,[241]6.0768,[242]6.0802,[243]6.0784,[244]6.0674,[245]6.0641,[246]6.0532,[247]6.0416,[248]6.0345,[249]6.0322,[250]6.0368,[251]6.0298,[252]6.0264,[253]6.0170,[254]6.0116,[255]6.0000,[256]5.9825,[257]5.9702,[258]5.9622,[259]5.9603,[260]5.9523,[261]5.9478,[262]5.9425,[263]5.9367,[264]5.9148,[265]5.9142,[266]5.9126,[267]5.9060,[268]5.9154,[269]5.9131,[270]5.9141,[271]5.9219,[272]5.9253,[273]5.9252,[274]5.9277,[275]5.9362,[276]5.9423,[277]5.9579,[278]5.9679,[279]5.9771,[280]5.9799,[281]5.9897,[282]5.9957,[283]6.0103,[284]6.0183,[285]6.0268,[286]6.0399,[287]6.0392,[288]6.0455,[289]6.0373,[290]6.0221,[291]6.0074,[292]5.9927,[293]5.9794,[294]5.9817,[295]5.9804,[296]5.9848,[297]5.9835,[298]5.9864,[299]5.9839,[300]5.9735,[301]5.9736,[302]5.9659,[303]5.9570,[304]5.9485,[305]5.9450,[306]5.9325,[307]5.9347,[308]5.9381,[309]5.9224,[310]5.9169,[311]5.9105,[312]5.9130,[313]5.9078,[314]5.9061,[315]5.8907,[316]5.8854,[317]5.8697,[318]5.8496,[319]5.8614,[320]5.8732,[321]5.8779,[322]5.8741,[323]5.8675,[324]5.8646,[325]5.8744,[326]5.8745,[327]5.8768,[328]5.8806,[329]5.8864,[330]5.8888,[331]5.9009,[332]5.8980,[333]5.9046,[334]5.8992,[335]5.8932,[336]5.8969,[337]5.8943,[338]5.8933,[339]5.8882,[340]5.8840,[341]5.8921,[342]5.8947,[343]5.8994,[344]5.8996,[345]5.9001,[346]5.8977,[347]5.9012,[348]5.9044,[349]5.9067,[350]5.9034,[351]5.9042,[352]5.9046,[353]5.8989,[354]5.8991,[355]5.9041,[356]5.9069,[357]5.9036,[358]5.9126,[359]5.9150,[360]5.9116,[361]5.9112,[362]5.9180,[363]5.9290,[364]5.9354,[365]5.9405,[366]5.9415,[367]5.9496,[368]5.9472,[369]5.9480,[370]5.9495,[371]5.9441,[372]5.9489,[373]5.9536,[374]5.9518,[375]5.9520,[376]5.9588,[377]5.9543,[378]5.9570,[379]5.9628,[380]5.9551,[381]5.9519,[382]5.9471,[383]5.9465,[384]5.9459,[385]5.9449,[386]5.9444,[387]5.9443,[388]5.9407,[389]5.9354,[390]5.9286,[391]5.9210,[392]5.9171,[393]5.9157,[394]5.9183,[395]5.9171,[396]5.9099,[397]5.9167,[398]5.9206,[399]5.9285,[400]5.9288,[401]5.9302,[402]5.9312,[403]5.9331,[404]5.9394,[405]5.9302,[406]5.9272,[407]5.9267,[408]5.9283,[409]5.9398,[410]5.9506,[411]5.9615,[412]5.9771,[413]5.9875,[414]5.9950,[415]6.0003,[416]6.0078,[417]6.0197,[418]6.0234,[419]6.0302,[420]6.0391,[421]6.0505,[422]6.0545,[423]6.0617,[424]6.0719,[425]6.0805,[426]6.0869,[427]6.0912,[428]6.0997,[429]6.1048,[430]6.1131,[431]6.1270,[432]6.1308,[433]6.1302,[434]6.1262,[435]6.1271,[436]6.1297,[437]6.1392,[438]6.1467,[439]6.1436,[440]6.1426,[441]6.1377,[442]6.1362,[443]6.1377,[444]6.1378,[445]6.1361,[446]6.1386,[447]6.1417,[448]6.1458,[449]6.1432,[450]6.1442,[451]6.1402,[452]6.1267,[453]6.1184,[454]6.1129,[455]6.1138,[456]6.1184,[457]6.1204,[458]6.1181,[459]6.1187,[460]6.1272,[461]6.1247,[462]6.1232,[463]6.1274,[464]6.1262,[465]6.1234,[466]6.1157,[467]6.1158,[468]6.1155,[469]6.1175,[470]6.1179,[471]6.1131,[472]6.1174,[473]6.1121,[474]6.1135,[475]6.1075,[476]6.1092,[477]6.1020,[478]6.1010,[479]6.1070,[480]6.1113,[481]6.1133,[482]6.1088,[483]6.1046,[484]6.1065,[485]6.1049,[486]6.0994,[487]6.0992,[488]6.0971,[489]6.0926,[490]6.0904,[491]6.0875,[492]6.0820,[493]6.0792,[494]6.0777,[495]6.0774,[496]6.0738,[497]6.0683,[498]6.0665,[499]6.0624,[500]6.0532,[501]6.0467,[502]6.0470,[503]6.0463,[504]6.0378,[505]6.0400,[506]6.0406,[507]6.0350,[508]6.0310,[509]6.0304,[510]6.0339,[511]6.0384,[512]6.0419,[513]6.0439,[514]6.0502,[515]6.0448,[516]6.0439,[517]6.0446,[518]6.0445,[519]6.0473,[520]6.0499,[521]6.0512,[522]6.0538,[523]6.0544,[524]6.0600,[525]6.0632,[526]6.0643,[527]6.0661,[528]6.0610,[529]6.0615,[530]6.0564,[531]6.0551,[532]6.0596,[533]6.0619,[534]6.0606,[535]6.0629,[536]6.0575,[537]6.0554,[538]6.0603,[539]6.0614,[540]6.0651,[541]6.0654,[542]6.0666,[543]6.0681,[544]6.0691,[545]6.0673,[546]6.0682,[547]6.0641,[548]6.0594,[549]6.0594,[550]6.0565,[551]6.0532,[552]6.0513,[553]6.0478,[554]6.0457,[555]6.0427,[556]6.0423,[557]6.0445,[558]6.0410,[559]6.0407,[560]6.0405,[561]6.0408,[562]6.0384,[563]6.0380,[564]6.0423,[565]6.0443,[566]6.0443,[567]6.0423,[568]6.0427,[569]6.0415,[570]6.0443,[571]6.0446,[572]6.0457,[573]6.0458,[574]6.0425,[575]6.0419,[576]6.0417,[577]6.0402,[578]6.0383,[579]6.0389,[580]6.0326,[581]6.0291,[582]6.0280,[583]6.0288,[584]6.0291,[585]6.0216,[586]6.0149,[587]6.0154,[588]6.0202,[589]6.0255,[590]6.0285,[591]6.0305,[592]6.0295,[593]6.0265,[594]6.0274,[595]6.0252,[596]6.0284,[597]6.0265,[598]6.0236,[599]6.0257,[600]6.0253,[601]6.0240,[602]6.0252,[603]6.0281,[604]6.0290,[605]6.0323,[606]6.0345,[607]6.0328,[608]6.0295,[609]6.0304,[610]6.0339,[611]6.0321,[612]6.0346,[613]6.0311,[614]6.0262,[615]6.0191,[616]6.0218,[617]6.0160,[618]6.0112,[619]6.0058,[620]5.9924,[621]5.9857,[622]5.9840,[623]5.9856,[624]5.9860,[625]5.9861,[626]5.9850,[627]5.9872,[628]5.9874,[629]5.9869,[630]5.9900,[631]5.9954,[632]6.0009,[633]5.9995,[634]6.0030,[635]6.0036,[636]6.0002,[637]5.9969,[638]5.9994,[639]5.9963,[640]5.9972,[641]5.9975,[642]6.0040,[643]6.0063,[644]6.0075,[645]6.0056,[646]6.0096,[647]6.0059,[648]6.0067,[649]6.0068,[650]6.0106,[651]6.0159,[652]6.0168,[653]6.0206,[654]6.0144,[655]6.0139,
llama_print_timings:        load time =  4416.87 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 1163995.97 ms / 335360 tokens (    3.47 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 1198461.81 ms

Q5_1

#define QK5_1 32
typedef struct {
    ggml_fp16_t d;          // delta
    ggml_fp16_t m;          // min
    uint32_t qh;            // 5-th bit of quants
    uint8_t qs[QK5_1 / 2];  // nibbles / quants
} block_q5_1;

This format is the same size as Q4_1 and Q4_3. On M1 Pro, it evaluates at about 55 ms / token for 7B model

The AVX implementation might make use of the following trick: https://stackoverflow.com/a/24242696

Perplexity for 7B: 5.9934

main: seed = 1682491079
llama.cpp: loading model from ../models/7B/ggml-model-q5_0.bin
llama_model_load_internal: format     = ggjt v1 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 8 (mostly Q5_0)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 4936267.11 KB
llama_model_load_internal: mem required  = 6612.57 MB (+ 1026.00 MB per state)
....................................................................................................
llama_init_from_file: kv self size  =  256.00 MB

system_info: n_threads = 12 / 32 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 | 
perplexity : calculating perplexity over 655 chunks, batch_size=512
4.47 seconds per pass - ETA 48 minutes
[1]4.2726,[2]4.7565,[3]5.6331,[4]6.2042,[5]6.3451,[6]6.3059,[7]6.4909,[8]6.5871,[9]6.9243,[10]7.1597,[11]7.3774,[12]7.4015,[13]7.3209,[14]7.3676,[15]7.6199,[16]7.2420,[17]7.1286,[18]7.0729,[19]6.7181,[20]6.7082,[21]6.6191,[22]6.4438,[23]6.4184,[24]6.3280,[25]6.3274,[26]6.1686,[27]5.9965,[28]5.8979,[29]5.8120,[30]5.6595,[31]5.6332,[32]5.6517,[33]5.5956,[34]5.6265,[35]5.6486,[36]5.6873,[37]5.6899,[38]5.7015,[39]5.7330,[40]5.7819,[41]5.7887,[42]5.8273,[43]5.7886,[44]5.8450,[45]5.8481,[46]5.8224,[47]5.8428,[48]5.8164,[49]5.8186,[50]5.7792,[51]5.7755,[52]5.7657,[53]5.8109,[54]5.7964,[55]5.7747,[56]5.8023,[57]5.8232,[58]5.8428,[59]5.8607,[60]5.9020,[61]5.8953,[62]5.9527,[63]5.9840,[64]5.9978,[65]6.0403,[66]6.0480,[67]6.0658,[68]6.0795,[69]6.1037,[70]6.1335,[71]6.1559,[72]6.1870,[73]6.2448,[74]6.2483,[75]6.2627,[76]6.2745,[77]6.2864,[78]6.2724,[79]6.3003,[80]6.2942,[81]6.3078,[82]6.3123,[83]6.2612,[84]6.2434,[85]6.2310,[86]6.2091,[87]6.1446,[88]6.1200,[89]6.1001,[90]6.0861,[91]6.1102,[92]6.1045,[93]6.1043,[94]6.1014,[95]6.1292,[96]6.1288,[97]6.1234,[98]6.1171,[99]6.1039,[100]6.1026,[101]6.1260,[102]6.1220,[103]6.1422,[104]6.1490,[105]6.1488,[106]6.1662,[107]6.1657,[108]6.1787,[109]6.1732,[110]6.1700,[111]6.1917,[112]6.2121,[113]6.2146,[114]6.2101,[115]6.2159,[116]6.2056,[117]6.2103,[118]6.2398,[119]6.2614,[120]6.2956,[121]6.3101,[122]6.3337,[123]6.3701,[124]6.3873,[125]6.3786,[126]6.4164,[127]6.4521,[128]6.4821,[129]6.4672,[130]6.4757,[131]6.4718,[132]6.4630,[133]6.4508,[134]6.4598,[135]6.4561,[136]6.4451,[137]6.4376,[138]6.4205,[139]6.4098,[140]6.4064,[141]6.3775,[142]6.3740,[143]6.3440,[144]6.3233,[145]6.3139,[146]6.3020,[147]6.3048,[148]6.3045,[149]6.2989,[150]6.2941,[151]6.2961,[152]6.2859,[153]6.2701,[154]6.2611,[155]6.2679,[156]6.2632,[157]6.2792,[158]6.2835,[159]6.2884,[160]6.2909,[161]6.3036,[162]6.2761,[163]6.2647,[164]6.2420,[165]6.2117,[166]6.1852,[167]6.1488,[168]6.1189,[169]6.1056,[170]6.0951,[171]6.0693,[172]6.0527,[173]6.0368,[174]6.0077,[175]5.9864,[176]5.9749,[177]5.9553,[178]5.9332,[179]5.9165,[180]5.9070,[181]5.8855,[182]5.8680,[183]5.8547,[184]5.8541,[185]5.8471,[186]5.8478,[187]5.8534,[188]5.8494,[189]5.8663,[190]5.8672,[191]5.8874,[192]5.9032,[193]5.9191,[194]5.9298,[195]5.9514,[196]5.9668,[197]5.9877,[198]6.0027,[199]6.0056,[200]6.0104,[201]6.0051,[202]6.0232,[203]6.0304,[204]6.0287,[205]6.0390,[206]6.0462,[207]6.0426,[208]6.0506,[209]6.0543,[210]6.0596,[211]6.0700,[212]6.0769,[213]6.0873,[214]6.0898,[215]6.0925,[216]6.1063,[217]6.1243,[218]6.1372,[219]6.1368,[220]6.1330,[221]6.1274,[222]6.1253,[223]6.1157,[224]6.1089,[225]6.1052,[226]6.1252,[227]6.1332,[228]6.1387,[229]6.1447,[230]6.1416,[231]6.1583,[232]6.1464,[233]6.1301,[234]6.1153,[235]6.0955,[236]6.0891,[237]6.0797,[238]6.0823,[239]6.0676,[240]6.0576,[241]6.0593,[242]6.0630,[243]6.0612,[244]6.0501,[245]6.0469,[246]6.0357,[247]6.0245,[248]6.0174,[249]6.0149,[250]6.0194,[251]6.0127,[252]6.0091,[253]5.9995,[254]5.9941,[255]5.9830,[256]5.9653,[257]5.9534,[258]5.9457,[259]5.9432,[260]5.9354,[261]5.9313,[262]5.9261,[263]5.9209,[264]5.8991,[265]5.8985,[266]5.8963,[267]5.8899,[268]5.8988,[269]5.8969,[270]5.8974,[271]5.9052,[272]5.9085,[273]5.9088,[274]5.9112,[275]5.9192,[276]5.9254,[277]5.9410,[278]5.9508,[279]5.9598,[280]5.9624,[281]5.9722,[282]5.9780,[283]5.9927,[284]6.0004,[285]6.0087,[286]6.0218,[287]6.0211,[288]6.0267,[289]6.0185,[290]6.0030,[291]5.9883,[292]5.9739,[293]5.9609,[294]5.9629,[295]5.9619,[296]5.9666,[297]5.9652,[298]5.9680,[299]5.9656,[300]5.9551,[301]5.9552,[302]5.9477,[303]5.9390,[304]5.9306,[305]5.9271,[306]5.9146,[307]5.9170,[308]5.9200,[309]5.9045,[310]5.8993,[311]5.8931,[312]5.8954,[313]5.8900,[314]5.8883,[315]5.8731,[316]5.8680,[317]5.8523,[318]5.8324,[319]5.8440,[320]5.8560,[321]5.8602,[322]5.8562,[323]5.8497,[324]5.8470,[325]5.8572,[326]5.8572,[327]5.8595,[328]5.8633,[329]5.8690,[330]5.8718,[331]5.8836,[332]5.8808,[333]5.8874,[334]5.8822,[335]5.8763,[336]5.8801,[337]5.8777,[338]5.8769,[339]5.8718,[340]5.8677,[341]5.8756,[342]5.8786,[343]5.8832,[344]5.8834,[345]5.8837,[346]5.8812,[347]5.8851,[348]5.8883,[349]5.8905,[350]5.8873,[351]5.8881,[352]5.8884,[353]5.8827,[354]5.8831,[355]5.8882,[356]5.8912,[357]5.8877,[358]5.8967,[359]5.8994,[360]5.8959,[361]5.8954,[362]5.9023,[363]5.9135,[364]5.9194,[365]5.9243,[366]5.9256,[367]5.9341,[368]5.9317,[369]5.9326,[370]5.9342,[371]5.9290,[372]5.9336,[373]5.9381,[374]5.9366,[375]5.9368,[376]5.9433,[377]5.9389,[378]5.9416,[379]5.9473,[380]5.9395,[381]5.9361,[382]5.9314,[383]5.9308,[384]5.9304,[385]5.9293,[386]5.9290,[387]5.9288,[388]5.9252,[389]5.9201,[390]5.9134,[391]5.9059,[392]5.9018,[393]5.9004,[394]5.9029,[395]5.9016,[396]5.8946,[397]5.9016,[398]5.9053,[399]5.9129,[400]5.9131,[401]5.9146,[402]5.9158,[403]5.9176,[404]5.9238,[405]5.9143,[406]5.9112,[407]5.9105,[408]5.9121,[409]5.9233,[410]5.9344,[411]5.9455,[412]5.9610,[413]5.9716,[414]5.9790,[415]5.9843,[416]5.9918,[417]6.0035,[418]6.0069,[419]6.0136,[420]6.0222,[421]6.0337,[422]6.0376,[423]6.0445,[424]6.0550,[425]6.0634,[426]6.0697,[427]6.0739,[428]6.0821,[429]6.0871,[430]6.0952,[431]6.1090,[432]6.1126,[433]6.1119,[434]6.1079,[435]6.1090,[436]6.1115,[437]6.1211,[438]6.1284,[439]6.1254,[440]6.1246,[441]6.1199,[442]6.1185,[443]6.1197,[444]6.1202,[445]6.1184,[446]6.1208,[447]6.1238,[448]6.1280,[449]6.1256,[450]6.1265,[451]6.1228,[452]6.1093,[453]6.1006,[454]6.0949,[455]6.0958,[456]6.1004,[457]6.1024,[458]6.1000,[459]6.1005,[460]6.1089,[461]6.1062,[462]6.1049,[463]6.1089,[464]6.1079,[465]6.1052,[466]6.0977,[467]6.0981,[468]6.0979,[469]6.0999,[470]6.1005,[471]6.0958,[472]6.1001,[473]6.0948,[474]6.0960,[475]6.0902,[476]6.0920,[477]6.0848,[478]6.0837,[479]6.0895,[480]6.0941,[481]6.0959,[482]6.0915,[483]6.0873,[484]6.0891,[485]6.0871,[486]6.0815,[487]6.0812,[488]6.0790,[489]6.0743,[490]6.0720,[491]6.0692,[492]6.0636,[493]6.0608,[494]6.0590,[495]6.0584,[496]6.0547,[497]6.0491,[498]6.0474,[499]6.0433,[500]6.0340,[501]6.0274,[502]6.0276,[503]6.0270,[504]6.0184,[505]6.0206,[506]6.0214,[507]6.0157,[508]6.0117,[509]6.0112,[510]6.0145,[511]6.0192,[512]6.0226,[513]6.0245,[514]6.0305,[515]6.0252,[516]6.0243,[517]6.0253,[518]6.0248,[519]6.0278,[520]6.0301,[521]6.0312,[522]6.0338,[523]6.0343,[524]6.0400,[525]6.0431,[526]6.0440,[527]6.0455,[528]6.0406,[529]6.0411,[530]6.0362,[531]6.0350,[532]6.0395,[533]6.0417,[534]6.0399,[535]6.0421,[536]6.0369,[537]6.0349,[538]6.0398,[539]6.0409,[540]6.0446,[541]6.0449,[542]6.0459,[543]6.0475,[544]6.0486,[545]6.0468,[546]6.0478,[547]6.0437,[548]6.0391,[549]6.0390,[550]6.0361,[551]6.0327,[552]6.0306,[553]6.0271,[554]6.0251,[555]6.0221,[556]6.0218,[557]6.0242,[558]6.0206,[559]6.0204,[560]6.0202,[561]6.0205,[562]6.0183,[563]6.0180,[564]6.0224,[565]6.0244,[566]6.0242,[567]6.0220,[568]6.0226,[569]6.0212,[570]6.0240,[571]6.0245,[572]6.0253,[573]6.0253,[574]6.0218,[575]6.0213,[576]6.0213,[577]6.0196,[578]6.0177,[579]6.0181,[580]6.0117,[581]6.0080,[582]6.0070,[583]6.0079,[584]6.0081,[585]6.0007,[586]5.9940,[587]5.9947,[588]5.9994,[589]6.0049,[590]6.0079,[591]6.0100,[592]6.0089,[593]6.0056,[594]6.0066,[595]6.0042,[596]6.0076,[597]6.0054,[598]6.0028,[599]6.0048,[600]6.0044,[601]6.0030,[602]6.0040,[603]6.0069,[604]6.0078,[605]6.0111,[606]6.0132,[607]6.0116,[608]6.0082,[609]6.0091,[610]6.0127,[611]6.0111,[612]6.0137,[613]6.0101,[614]6.0053,[615]5.9983,[616]6.0008,[617]5.9949,[618]5.9903,[619]5.9850,[620]5.9717,[621]5.9650,[622]5.9634,[623]5.9650,[624]5.9655,[625]5.9658,[626]5.9647,[627]5.9670,[628]5.9672,[629]5.9668,[630]5.9699,[631]5.9754,[632]5.9810,[633]5.9795,[634]5.9829,[635]5.9834,[636]5.9800,[637]5.9767,[638]5.9791,[639]5.9760,[640]5.9770,[641]5.9771,[642]5.9836,[643]5.9857,[644]5.9869,[645]5.9851,[646]5.9890,[647]5.9850,[648]5.9860,[649]5.9862,[650]5.9901,[651]5.9952,[652]5.9963,[653]6.0002,[654]5.9940,[655]5.9934,
llama_print_timings:        load time =  6541.18 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 2917328.96 ms / 335360 tokens (    8.70 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 2951478.31 ms

TODO:

  • [x] cuBLAS perplexity
  • [x] dot scalar
  • [x] dot ARM
  • [ ] dot AVX

ggerganov avatar Apr 26 '23 07:04 ggerganov