llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

Measure perplexity delta between Q4_0 and F16 "output" tensor

Open ggerganov opened this issue 1 year ago • 1 comments

The last tensor of the transformer (called output in llama.cpp) is one of the biggest ones:

https://github.com/ggerganov/llama.cpp/blob/0ad964631f9b3970f1936008fcfb1eadef59c7ed/llama.cpp#L945

I wonder how the perplexity improves by keeping it in F16 format instead of quantizing that particular tensor

Results

Q4_0 M1 Pro (with BLAS) [655]6.2838 (i.e. reference)
$  make clean && make -j perplexity && time ./perplexity -m ./models/7B/ggml-model-q4_0.bin -f ./build/wiki.test.raw -t 8
I llama.cpp build info: 
I UNAME_S:  Darwin
I UNAME_P:  arm
I UNAME_M:  arm64
I CFLAGS:   -I.              -O3 -DNDEBUG -std=c11   -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -Wno-unused-function -pthread -DGGML_USE_ACCELERATE
I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread
I LDFLAGS:   -framework Accelerate
I CC:       Apple clang version 14.0.3 (clang-1403.0.22.14.1)
I CXX:      Apple clang version 14.0.3 (clang-1403.0.22.14.1)

rm -vf *.o main quantize quantize-stats perplexity embedding benchmark-q4_0-matmult
common.o
ggml.o
llama.o
main
quantize
perplexity
embedding
I llama.cpp build info: 
I UNAME_S:  Darwin
I UNAME_P:  arm
I UNAME_M:  arm64
I CFLAGS:   -I.              -O3 -DNDEBUG -std=c11   -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -Wno-unused-function -pthread -DGGML_USE_ACCELERATE
I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread
I LDFLAGS:   -framework Accelerate
I CC:       Apple clang version 14.0.3 (clang-1403.0.22.14.1)
I CXX:      Apple clang version 14.0.3 (clang-1403.0.22.14.1)

cc  -I.              -O3 -DNDEBUG -std=c11   -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -Wno-unused-function -pthread -DGGML_USE_ACCELERATE   -c ggml.c -o ggml.o
c++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -c llama.cpp -o llama.o
c++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -c examples/common.cpp -o common.o
c++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread examples/perplexity/perplexity.cpp ggml.o llama.o common.o -o perplexity  -framework Accelerate
main: seed = 1681463663
llama.cpp: loading model from ./models/7B/ggml-model-q4_0.bin
llama_model_load_internal: format     = ggjt v1 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =  59.11 KB
llama_model_load_internal: mem required  = 5809.32 MB (+ 1026.00 MB per state)
llama_init_from_file: kv self size  =  256.00 MB

system_info: n_threads = 8 / 10 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 | 
perplexity : calculating perplexity over 655 chunks, batch_size=512
10.60 seconds per pass - ETA 1.93 hours
[1]4.3802,[2]4.9555,[3]5.8269,[4]6.4692,[5]6.5435,[6]6.5411,[7]6.7174,[8]6.8069,[9]7.1756,[10]7.4121,[11]7.6567,[12]7.6957,[13]7.6058,[14]7.6820,[15]7.9366,[16]7.5419,[17]7.4189,[18]7.3798,[19]7.0077,[20]6.9948,[21]6.8969,[22]6.7125,[23]6.6744,[24]6.5868,[25]6.5871,[26]6.4149,[27]6.2349,[28]6.1341,[29]6.0498,[30]5.8938,[31]5.8659,[32]5.8839,[33]5.8189,[34]5.8537,[35]5.8795,[36]5.9232,[37]5.9272,[38]5.9443,[39]5.9825,[40]6.0412,[41]6.0482,[42]6.0826,[43]6.0397,[44]6.0944,[45]6.0989,[46]6.0729,[47]6.0967,[48]6.0674,[49]6.0745,[50]6.0351,[51]6.0309,[52]6.0200,[53]6.0641,[54]6.0476,[55]6.0250,[56]6.0593,[57]6.0824,[58]6.1043,[59]6.1182,[60]6.1647,[61]6.1536,[62]6.2166,[63]6.2502,[64]6.2653,[65]6.3119,[66]6.3220,[67]6.3401,[68]6.3541,[69]6.3790,[70]6.4113,[71]6.4327,[72]6.4625,[73]6.5276,[74]6.5330,[75]6.5474,[76]6.5637,[77]6.5770,[78]6.5618,[79]6.5914,[80]6.5839,[81]6.5967,[82]6.6005,[83]6.5468,[84]6.5322,[85]6.5208,[86]6.4997,[87]6.4344,[88]6.4059,[89]6.3853,[90]6.3687,[91]6.3948,[92]6.3909,[93]6.3935,[94]6.3910,[95]6.4198,[96]6.4177,[97]6.4105,[98]6.4035,[99]6.3895,[100]6.3895,[101]6.4154,[102]6.4091,[103]6.4308,[104]6.4376,[105]6.4361,[106]6.4538,[107]6.4525,[108]6.4648,[109]6.4595,[110]6.4550,[111]6.4779,[112]6.4969,[113]6.4983,[114]6.4949,[115]6.5032,[116]6.4958,[117]6.5014,[118]6.5298,[119]6.5507,[120]6.5872,[121]6.6035,[122]6.6282,[123]6.6672,[124]6.6850,[125]6.6762,[126]6.7153,[127]6.7524,[128]6.7798,[129]6.7629,[130]6.7725,[131]6.7672,[132]6.7584,[133]6.7456,[134]6.7568,[135]6.7534,[136]6.7402,[137]6.7322,[138]6.7151,[139]6.7035,[140]6.7005,[141]6.6707,[142]6.6658,[143]6.6379,[144]6.6178,[145]6.6092,[146]6.5957,[147]6.6031,[148]6.6054,[149]6.5994,[150]6.5953,[151]6.5965,[152]6.5870,[153]6.5703,[154]6.5613,[155]6.5680,[156]6.5630,[157]6.5813,[158]6.5849,[159]6.5890,[160]6.5916,[161]6.6041,[162]6.5739,[163]6.5619,[164]6.5357,[165]6.5039,[166]6.4751,[167]6.4377,[168]6.4051,[169]6.3916,[170]6.3791,[171]6.3502,[172]6.3322,[173]6.3136,[174]6.2829,[175]6.2607,[176]6.2505,[177]6.2295,[178]6.2059,[179]6.1887,[180]6.1798,[181]6.1574,[182]6.1382,[183]6.1239,[184]6.1238,[185]6.1165,[186]6.1182,[187]6.1236,[188]6.1200,[189]6.1384,[190]6.1393,[191]6.1597,[192]6.1760,[193]6.1938,[194]6.2054,[195]6.2263,[196]6.2434,[197]6.2655,[198]6.2810,[199]6.2840,[200]6.2885,[201]6.2844,[202]6.3049,[203]6.3115,[204]6.3114,[205]6.3224,[206]6.3302,[207]6.3262,[208]6.3346,[209]6.3398,[210]6.3449,[211]6.3547,[212]6.3620,[213]6.3727,[214]6.3762,[215]6.3802,[216]6.3951,[217]6.4129,[218]6.4264,[219]6.4267,[220]6.4231,[221]6.4168,[222]6.4133,[223]6.4024,[224]6.3958,[225]6.3910,[226]6.4125,[227]6.4212,[228]6.4271,[229]6.4338,[230]6.4294,[231]6.4462,[232]6.4332,[233]6.4160,[234]6.4004,[235]6.3845,[236]6.3768,[237]6.3664,[238]6.3698,[239]6.3536,[240]6.3433,[241]6.3466,[242]6.3503,[243]6.3488,[244]6.3368,[245]6.3342,[246]6.3221,[247]6.3097,[248]6.3030,[249]6.3010,[250]6.3057,[251]6.2980,[252]6.2946,[253]6.2844,[254]6.2804,[255]6.2688,[256]6.2496,[257]6.2385,[258]6.2299,[259]6.2279,[260]6.2197,[261]6.2154,[262]6.2095,[263]6.2050,[264]6.1858,[265]6.1850,[266]6.1835,[267]6.1766,[268]6.1862,[269]6.1843,[270]6.1850,[271]6.1928,[272]6.1974,[273]6.1969,[274]6.1983,[275]6.2073,[276]6.2128,[277]6.2288,[278]6.2397,[279]6.2483,[280]6.2518,[281]6.2617,[282]6.2678,[283]6.2825,[284]6.2902,[285]6.2997,[286]6.3144,[287]6.3138,[288]6.3198,[289]6.3107,[290]6.2956,[291]6.2802,[292]6.2644,[293]6.2505,[294]6.2530,[295]6.2524,[296]6.2567,[297]6.2553,[298]6.2579,[299]6.2551,[300]6.2439,[301]6.2440,[302]6.2359,[303]6.2282,[304]6.2204,[305]6.2180,[306]6.2047,[307]6.2072,[308]6.2104,[309]6.1941,[310]6.1880,[311]6.1816,[312]6.1838,[313]6.1782,[314]6.1769,[315]6.1604,[316]6.1562,[317]6.1395,[318]6.1179,[319]6.1298,[320]6.1428,[321]6.1466,[322]6.1421,[323]6.1355,[324]6.1331,[325]6.1431,[326]6.1430,[327]6.1451,[328]6.1494,[329]6.1554,[330]6.1579,[331]6.1703,[332]6.1671,[333]6.1741,[334]6.1682,[335]6.1617,[336]6.1655,[337]6.1625,[338]6.1612,[339]6.1555,[340]6.1511,[341]6.1589,[342]6.1613,[343]6.1669,[344]6.1668,[345]6.1667,[346]6.1638,[347]6.1686,[348]6.1727,[349]6.1746,[350]6.1712,[351]6.1717,[352]6.1717,[353]6.1665,[354]6.1664,[355]6.1718,[356]6.1749,[357]6.1712,[358]6.1802,[359]6.1833,[360]6.1795,[361]6.1791,[362]6.1858,[363]6.1970,[364]6.2035,[365]6.2093,[366]6.2100,[367]6.2188,[368]6.2165,[369]6.2175,[370]6.2185,[371]6.2125,[372]6.2178,[373]6.2234,[374]6.2220,[375]6.2217,[376]6.2301,[377]6.2252,[378]6.2277,[379]6.2338,[380]6.2254,[381]6.2211,[382]6.2154,[383]6.2144,[384]6.2137,[385]6.2124,[386]6.2119,[387]6.2111,[388]6.2066,[389]6.2012,[390]6.1943,[391]6.1862,[392]6.1821,[393]6.1802,[394]6.1828,[395]6.1812,[396]6.1738,[397]6.1814,[398]6.1852,[399]6.1935,[400]6.1931,[401]6.1944,[402]6.1950,[403]6.1969,[404]6.2032,[405]6.1937,[406]6.1903,[407]6.1895,[408]6.1905,[409]6.2029,[410]6.2139,[411]6.2264,[412]6.2427,[413]6.2542,[414]6.2618,[415]6.2670,[416]6.2750,[417]6.2881,[418]6.2916,[419]6.2990,[420]6.3076,[421]6.3197,[422]6.3255,[423]6.3326,[424]6.3446,[425]6.3537,[426]6.3602,[427]6.3647,[428]6.3730,[429]6.3775,[430]6.3865,[431]6.4011,[432]6.4054,[433]6.4041,[434]6.3995,[435]6.4002,[436]6.4026,[437]6.4121,[438]6.4200,[439]6.4164,[440]6.4158,[441]6.4108,[442]6.4099,[443]6.4112,[444]6.4115,[445]6.4095,[446]6.4118,[447]6.4147,[448]6.4190,[449]6.4164,[450]6.4167,[451]6.4124,[452]6.4005,[453]6.3922,[454]6.3862,[455]6.3869,[456]6.3917,[457]6.3934,[458]6.3912,[459]6.3922,[460]6.4009,[461]6.3981,[462]6.3965,[463]6.4015,[464]6.4006,[465]6.3976,[466]6.3895,[467]6.3898,[468]6.3897,[469]6.3919,[470]6.3924,[471]6.3876,[472]6.3922,[473]6.3866,[474]6.3880,[475]6.3821,[476]6.3844,[477]6.3773,[478]6.3764,[479]6.3827,[480]6.3879,[481]6.3899,[482]6.3854,[483]6.3813,[484]6.3835,[485]6.3818,[486]6.3763,[487]6.3763,[488]6.3744,[489]6.3694,[490]6.3667,[491]6.3637,[492]6.3579,[493]6.3548,[494]6.3530,[495]6.3528,[496]6.3493,[497]6.3440,[498]6.3422,[499]6.3372,[500]6.3275,[501]6.3206,[502]6.3204,[503]6.3202,[504]6.3109,[505]6.3133,[506]6.3142,[507]6.3081,[508]6.3038,[509]6.3027,[510]6.3066,[511]6.3113,[512]6.3148,[513]6.3166,[514]6.3232,[515]6.3177,[516]6.3169,[517]6.3180,[518]6.3181,[519]6.3211,[520]6.3238,[521]6.3255,[522]6.3283,[523]6.3294,[524]6.3357,[525]6.3393,[526]6.3405,[527]6.3426,[528]6.3372,[529]6.3376,[530]6.3329,[531]6.3319,[532]6.3367,[533]6.3390,[534]6.3372,[535]6.3395,[536]6.3341,[537]6.3318,[538]6.3366,[539]6.3378,[540]6.3417,[541]6.3426,[542]6.3433,[543]6.3447,[544]6.3459,[545]6.3437,[546]6.3443,[547]6.3398,[548]6.3343,[549]6.3345,[550]6.3318,[551]6.3280,[552]6.3260,[553]6.3217,[554]6.3195,[555]6.3166,[556]6.3163,[557]6.3186,[558]6.3146,[559]6.3142,[560]6.3137,[561]6.3139,[562]6.3120,[563]6.3120,[564]6.3163,[565]6.3180,[566]6.3177,[567]6.3155,[568]6.3160,[569]6.3144,[570]6.3170,[571]6.3176,[572]6.3186,[573]6.3188,[574]6.3151,[575]6.3147,[576]6.3145,[577]6.3135,[578]6.3114,[579]6.3122,[580]6.3056,[581]6.3018,[582]6.3008,[583]6.3016,[584]6.3020,[585]6.2943,[586]6.2875,[587]6.2877,[588]6.2927,[589]6.2985,[590]6.3015,[591]6.3037,[592]6.3022,[593]6.2985,[594]6.2996,[595]6.2973,[596]6.3010,[597]6.2987,[598]6.2949,[599]6.2971,[600]6.2969,[601]6.2954,[602]6.2972,[603]6.3001,[604]6.3012,[605]6.3044,[606]6.3065,[607]6.3048,[608]6.3013,[609]6.3019,[610]6.3056,[611]6.3037,[612]6.3062,[613]6.3026,[614]6.2975,[615]6.2898,[616]6.2928,[617]6.2865,[618]6.2814,[619]6.2757,[620]6.2615,[621]6.2542,[622]6.2525,[623]6.2540,[624]6.2545,[625]6.2544,[626]6.2529,[627]6.2550,[628]6.2555,[629]6.2552,[630]6.2586,[631]6.2650,[632]6.2704,[633]6.2687,[634]6.2721,[635]6.2726,[636]6.2694,[637]6.2659,[638]6.2686,[639]6.2657,[640]6.2667,[641]6.2669,[642]6.2738,[643]6.2760,[644]6.2772,[645]6.2751,[646]6.2793,[647]6.2755,[648]6.2762,[649]6.2763,[650]6.2801,[651]6.2858,[652]6.2865,[653]6.2908,[654]6.2844,[655]6.2838,

llama_print_timings:        load time = 11216.03 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 4989892.61 ms / 335360 tokens (   14.88 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 5024616.43 ms

real	83m45.024s
user	126m54.284s
sys	4m10.884s
Q4_0 + F16 "output" M1 Pro (with BLAS) [655]6.2355
$  make clean && make -j perplexity && time ./perplexity -m ./models/7B/ggml-model-q4_0-output-f16.bin -f ./build/wiki.test.raw -t 8
I llama.cpp build info: 
I UNAME_S:  Darwin
I UNAME_P:  arm
I UNAME_M:  arm64
I CFLAGS:   -I.              -O3 -DNDEBUG -std=c11   -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -Wno-unused-function -pthread -DGGML_USE_ACCELERATE
I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread
I LDFLAGS:   -framework Accelerate
I CC:       Apple clang version 14.0.3 (clang-1403.0.22.14.1)
I CXX:      Apple clang version 14.0.3 (clang-1403.0.22.14.1)

rm -vf *.o main quantize quantize-stats perplexity embedding benchmark-q4_0-matmult
common.o
ggml.o
llama.o
perplexity
I llama.cpp build info: 
I UNAME_S:  Darwin
I UNAME_P:  arm
I UNAME_M:  arm64
I CFLAGS:   -I.              -O3 -DNDEBUG -std=c11   -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -Wno-unused-function -pthread -DGGML_USE_ACCELERATE
I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread
I LDFLAGS:   -framework Accelerate
I CC:       Apple clang version 14.0.3 (clang-1403.0.22.14.1)
I CXX:      Apple clang version 14.0.3 (clang-1403.0.22.14.1)

cc  -I.              -O3 -DNDEBUG -std=c11   -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -Wno-unused-function -pthread -DGGML_USE_ACCELERATE   -c ggml.c -o ggml.o
c++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -c llama.cpp -o llama.o
c++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -c examples/common.cpp -o common.o
c++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread examples/perplexity/perplexity.cpp ggml.o llama.o common.o -o perplexity  -framework Accelerate
main: seed = 1681643028
llama.cpp: loading model from ./models/7B/ggml-model-q4_0-output-f16.bin
llama_model_load_internal: format     = ggjt v1 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =  59.11 KB
llama_model_load_internal: mem required  = 5981.20 MB (+ 1026.00 MB per state)
llama_init_from_file: kv self size  =  256.00 MB

system_info: n_threads = 8 / 10 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 | 
perplexity : calculating perplexity over 655 chunks, batch_size=512
8.00 seconds per pass - ETA 1.46 hours
[1]4.3820,[2]4.9320,[3]5.7997,[4]6.4555,[5]6.5526,[6]6.5317,[7]6.6993,[8]6.7881,[9]7.1519,[10]7.3934,[11]7.6302,[12]7.6638,[13]7.5714,[14]7.6539,[15]7.9047,[16]7.5068,[17]7.3838,[18]7.3480,[19]6.9755,[20]6.9635,[21]6.8675,[22]6.6824,[23]6.6426,[24]6.5537,[25]6.5529,[26]6.3839,[27]6.2037,[28]6.1026,[29]6.0173,[30]5.8628,[31]5.8366,[32]5.8546,[33]5.7921,[34]5.8260,[35]5.8504,[36]5.8945,[37]5.9009,[38]5.9162,[39]5.9532,[40]6.0109,[41]6.0177,[42]6.0521,[43]6.0101,[44]6.0655,[45]6.0683,[46]6.0426,[47]6.0653,[48]6.0340,[49]6.0399,[50]6.0009,[51]5.9970,[52]5.9861,[53]6.0296,[54]6.0120,[55]5.9897,[56]6.0228,[57]6.0441,[58]6.0658,[59]6.0792,[60]6.1234,[61]6.1133,[62]6.1763,[63]6.2096,[64]6.2248,[65]6.2699,[66]6.2788,[67]6.2958,[68]6.3095,[69]6.3344,[70]6.3670,[71]6.3873,[72]6.4174,[73]6.4828,[74]6.4876,[75]6.5007,[76]6.5153,[77]6.5276,[78]6.5126,[79]6.5407,[80]6.5326,[81]6.5463,[82]6.5502,[83]6.4981,[84]6.4830,[85]6.4714,[86]6.4496,[87]6.3847,[88]6.3565,[89]6.3362,[90]6.3209,[91]6.3459,[92]6.3416,[93]6.3440,[94]6.3410,[95]6.3697,[96]6.3678,[97]6.3612,[98]6.3548,[99]6.3409,[100]6.3413,[101]6.3673,[102]6.3601,[103]6.3811,[104]6.3873,[105]6.3867,[106]6.4034,[107]6.4011,[108]6.4136,[109]6.4074,[110]6.4026,[111]6.4245,[112]6.4435,[113]6.4448,[114]6.4416,[115]6.4494,[116]6.4420,[117]6.4473,[118]6.4765,[119]6.4978,[120]6.5334,[121]6.5496,[122]6.5749,[123]6.6139,[124]6.6311,[125]6.6222,[126]6.6614,[127]6.6977,[128]6.7255,[129]6.7096,[130]6.7191,[131]6.7134,[132]6.7047,[133]6.6922,[134]6.7029,[135]6.6991,[136]6.6870,[137]6.6790,[138]6.6625,[139]6.6515,[140]6.6482,[141]6.6195,[142]6.6150,[143]6.5875,[144]6.5680,[145]6.5592,[146]6.5460,[147]6.5535,[148]6.5555,[149]6.5491,^[[B^[[B^[[B^[[B^[[B^[[B[150]6.5451,[151]6.5464,[152]6.5363,[153]6.5191,[154]6.5099,[155]6.5168,[156]6.5117,[157]6.5297,[158]6.5331,[159]6.5372,[160]6.5394,[161]6.5517,[162]6.5219,[163]6.5102,[164]6.4844,[165]6.4525,[166]6.4245,[167]6.3876,[168]6.3551,[169]6.3419,[170]6.3298,[171]6.3012,[172]6.2834,[173]6.2650,[174]6.2349,[175]6.2133,[176]6.2031,[177]6.1820,[178]6.1591,[179]6.1420,[180]6.1333,[181]6.1111,[182]6.0922,[183]6.0782,[184]6.0781,[185]6.0709,[186]6.0727,[187]6.0783,[188]6.0745,[189]6.0930,[190]6.0941,[191]6.1149,[192]6.1311,[193]6.1486,[194]6.1601,[195]6.1811,[196]6.1974,[197]6.2193,[198]6.2349,[199]6.2384,[200]6.2431,[201]6.2386,[202]6.2590,[203]6.2657,[204]6.2656,[205]6.2766,[206]6.2843,[207]6.2805,[208]6.2886,[209]6.2935,[210]6.2989,[211]6.3091,[212]6.3164,[213]6.3269,[214]6.3305,[215]6.3342,[216]6.3496,[217]6.3675,[218]6.3807,[219]6.3809,[220]6.3774,[221]6.3711,[222]6.3680,[223]6.3574,[224]6.3502,[225]6.3459,[226]6.3671,[227]6.3756,[228]6.3817,[229]6.3879,[230]6.3838,[231]6.4007,[232]6.3874,[233]6.3703,[234]6.3544,[235]6.3387,[236]6.3314,[237]6.3205,[238]6.3237,[239]6.3076,[240]6.2971,[241]6.3001,[242]6.3037,[243]6.3022,[244]6.2904,[245]6.2879,[246]6.2758,[247]6.2636,[248]6.2570,[249]6.2548,[250]6.2590,[251]6.2516,[252]6.2482,[253]6.2383,[254]6.2341,[255]6.2226,[256]6.2035,[257]6.1923,[258]6.1839,[259]6.1820,[260]6.1740,[261]6.1699,[262]6.1641,[263]6.1597,[264]6.1406,[265]6.1398,^[[B^[[B[266]6.1385,^R
[267]6.1316,[268]6.1408,[269]6.1385,[270]6.1393,[271]6.1470,[272]6.1509,[273]6.1505,[274]6.1521,[275]6.1611,[276]6.1664,[277]6.1824,[278]6.1929,[279]6.2015,[280]6.2048,[281]6.2142,[282]6.2199,[283]6.2342,[284]6.2422,[285]6.2512,[286]6.2658,[287]6.2654,[288]6.2714,[289]6.2624,[290]6.2471,[291]6.2316,[292]6.2164,[293]6.2029,[294]6.2052,[295]6.2047,[296]6.2090,[297]6.2076,[298]6.2104,[299]6.2073,[300]6.1962,[301]6.1960,[302]6.1882,[303]6.1805,[304]6.1727,[305]6.1703,[306]6.1573,[307]6.1594,[308]6.1631,[309]6.1469,[310]6.1408,[311]6.1346,[312]6.1368,[313]6.1314,[314]6.1301,[315]6.1137,[316]6.1092,[317]6.0928,[318]6.0714,[319]6.0833,[320]6.0960,[321]6.0998,[322]6.0953,[323]6.0888,[324]6.0866,[325]6.0963,[326]6.0964,[327]6.0985,[328]6.1025,[329]6.1084,[330]6.1112,[331]6.1233,[332]6.1204,[333]6.1275,[334]6.1218,[335]6.1153,[336]6.1190,[337]6.1161,[338]6.1149,[339]6.1091,[340]6.1047,[341]6.1127,[342]6.1152,[343]6.1207,[344]6.1209,[345]6.1209,[346]6.1181,[347]6.1226,[348]6.1266,[349]6.1285,[350]6.1251,[351]6.1258,[352]6.1258,[353]6.1204,[354]6.1204,[355]6.1256,[356]6.1286,[357]6.1249,[358]6.1337,[359]6.1366,[360]6.1329,[361]6.1324,[362]6.1391,[363]6.1503,[364]6.1564,[365]6.1621,[366]6.1629,[367]6.1715,[368]6.1690,[369]6.1699,[370]6.1713,[371]6.1655,[372]6.1705,[373]6.1760,[374]6.1746,[375]6.1742,[376]6.1823,[377]6.1774,[378]6.1798,[379]6.1859,[380]6.1776,[381]6.1735,[382]6.1681,[383]6.1671,[384]6.1664,[385]6.1653,[386]6.1647,[387]6.1640,[388]6.1596,[389]6.1542,[390]6.1473,[391]6.1394,[392]6.1355,[393]6.1340,[394]6.1364,[395]6.1347,[396]6.1273,[397]6.1348,[398]6.1387,[399]6.1469,[400]6.1465,[401]6.1481,[402]6.1487,[403]6.1504,[404]6.1567,[405]6.1474,[406]6.1442,[407]6.1434,[408]6.1446,[409]6.1569,[410]6.1678,[411]6.1800,[412]6.1962,[413]6.2076,[414]6.2152,[415]6.2203,[416]6.2281,[417]6.2409,[418]6.2445,[419]6.2519,[420]6.2605,[421]6.2724,[422]6.2779,[423]6.2848,[424]6.2968,[425]6.3056,[426]6.3121,[427]6.3166,[428]6.3249,[429]6.3297,[430]6.3385,[431]6.3528,[432]6.3569,[433]6.3558,[434]6.3512,[435]6.3519,[436]6.3545,[437]6.3639,[438]6.3717,[439]6.3684,[440]6.3674,[441]6.3625,[442]6.3614,[443]6.3627,[444]6.3630,[445]6.3610,[446]6.3632,[447]6.3660,[448]6.3703,[449]6.3677,[450]6.3680,[451]6.3638,[452]6.3522,[453]6.3438,[454]6.3377,[455]6.3386,[456]6.3433,[457]6.3450,[458]6.3429,[459]6.3436,[460]6.3522,[461]6.3495,[462]6.3479,[463]6.3529,[464]6.3519,[465]6.3489,[466]6.3410,[467]6.3414,[468]6.3413,[469]6.3435,[470]6.3440,[471]6.3390,[472]6.3435,[473]6.3379,[474]6.3393,[475]6.3335,[476]6.3353,[477]6.3282,[478]6.3274,[479]6.3338,[480]6.3389,[481]6.3408,[482]6.3364,[483]6.3322,[484]6.3343,[485]6.3329,[486]6.3274,[487]6.3274,[488]6.3255,[489]6.3205,[490]6.3179,[491]6.3145,[492]6.3086,[493]6.3057,[494]6.3041,[495]6.3037,[496]6.3003,[497]6.2949,[498]6.2930,[499]6.2881,[500]6.2787,[501]6.2720,[502]6.2719,[503]6.2714,[504]6.2622,[505]6.2649,[506]6.2658,[507]6.2600,[508]6.2560,[509]6.2550,[510]6.2591,[511]6.2636,[512]6.2669,[513]6.2686,[514]6.2751,[515]6.2697,[516]6.2690,[517]6.2701,[518]6.2702,[519]6.2731,[520]6.2760,[521]6.2776,[522]6.2806,[523]6.2815,[524]6.2875,[525]6.2910,[526]6.2922,[527]6.2943,[528]6.2891,[529]6.2897,[530]6.2849,[531]6.2839,[532]6.2887,[533]6.2910,[534]6.2893,[535]6.2916,[536]6.2861,[537]6.2838,[538]6.2888,[539]6.2900,[540]6.2940,[541]6.2948,[542]6.2956,[543]6.2970,[544]6.2981,[545]6.2960,[546]6.2967,[547]6.2922,[548]6.2868,[549]6.2868,[550]6.2841,[551]6.2805,[552]6.2784,[553]6.2743,[554]6.2720,[555]6.2691,[556]6.2689,[557]6.2712,[558]6.2674,[559]6.2667,[560]6.2662,[561]6.2662,[562]6.2640,[563]6.2640,[564]6.2682,[565]6.2698,[566]6.2696,[567]6.2675,[568]6.2679,[569]6.2663,[570]6.2689,[571]6.2694,[572]6.2703,[573]6.2703,[574]6.2667,[575]6.2664,[576]6.2661,[577]6.2651,[578]6.2630,[579]6.2639,[580]6.2572,[581]6.2534,[582]6.2524,[583]6.2531,[584]6.2534,[585]6.2457,[586]6.2390,[587]6.2393,[588]6.2442,[589]6.2499,[590]6.2530,[591]6.2549,[592]6.2535,[593]6.2501,[594]6.2511,[595]6.2488,[596]6.2525,[597]6.2503,[598]6.2466,[599]6.2486,[600]6.2482,[601]6.2469,[602]6.2486,[603]6.2516,[604]6.2526,[605]6.2557,[606]6.2576,[607]6.2561,[608]6.2527,[609]6.2531,[610]6.2568,[611]6.2549,[612]6.2574,[613]6.2537,[614]6.2487,[615]6.2411,[616]6.2440,[617]6.2379,[618]6.2330,[619]6.2274,[620]6.2133,[621]6.2061,[622]6.2043,[623]6.2058,[624]6.2062,[625]6.2061,[626]6.2047,[627]6.2068,[628]6.2073,[629]6.2069,[630]6.2102,[631]6.2166,[632]6.2219,[633]6.2203,[634]6.2238,[635]6.2244,[636]6.2212,[637]6.2179,[638]6.2206,[639]6.2176,[640]6.2187,[641]6.2189,[642]6.2256,[643]6.2279,[644]6.2290,[645]6.2271,[646]6.2314,[647]6.2275,[648]6.2282,[649]6.2282,[650]6.2320,[651]6.2377,[652]6.2384,[653]6.2426,[654]6.2361,[655]6.2355,

llama_print_timings:        load time =  8543.28 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 4971705.35 ms / 335360 tokens (   14.82 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 5006225.35 ms

real	83m26.348s
user	126m29.113s
sys	4m14.981s

Perplexity delta: -0.0542

Q4_0 + F16 "output" + F16 "tok_embd" M1 Pro (with BLAS) [655]6.2357
$  make clean && make -j perplexity && time ./perplexity -m ./models/7B/ggml-model-q4_0-output-f16-tok-f16.bin -f ./build/wiki.test.raw -t 8
I llama.cpp build info: 
I UNAME_S:  Darwin
I UNAME_P:  arm
I UNAME_M:  arm64
I CFLAGS:   -I.              -O3 -DNDEBUG -std=c11   -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -Wno-unused-function -pthread -DGGML_USE_ACCELERATE
I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread
I LDFLAGS:   -framework Accelerate
I CC:       Apple clang version 14.0.3 (clang-1403.0.22.14.1)
I CXX:      Apple clang version 14.0.3 (clang-1403.0.22.14.1)
v1_0l:        97      106      128      151
rm -vf *.o main quantize quantize-stats perplexity embedding benchmark-q4_0-matmult
common.o
ggml.o
llama.o
perplexity
I llama.cpp build info: 
I UNAME_S:  Darwin
I UNAME_P:  arm
I UNAME_M:  arm64
I CFLAGS:   -I.              -O3 -DNDEBUG -std=c11   -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -Wno-unused-function -pthread -DGGML_USE_ACCELERATE
I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread
I LDFLAGS:   -framework Accelerate
I CC:       Apple clang version 14.0.3 (clang-1403.0.22.14.1)
I CXX:      Apple clang version 14.0.3 (clang-1403.0.22.14.1)

cc  -I.              -O3 -DNDEBUG -std=c11   -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -Wno-unused-function -pthread -DGGML_USE_ACCELERATE   -c ggml.c -o ggml.o
c++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -c llama.cpp -o llama.o
c++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -c examples/common.cpp -o common.o
c++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread examples/perplexity/perplexity.cpp ggml.o llama.o common.o -o perplexity  -framework Accelerate
main: seed = 1681648653
llama.cpp: loading model from ./models/7B/ggml-model-q4_0-output-f16-tok-f16.bin
llama_model_load_internal: format     = ggjt v1 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =  59.11 KB
llama_model_load_internal: mem required  = 6153.07 MB (+ 1026.00 MB per state)
llama_init_from_file: kv self size  =  256.00 MB

system_info: n_threads = 8 / 10 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 | 
perplexity : calculating perplexity over 655 chunks, batch_size=512
15.48 seconds per pass - ETA 2.82 hours
[1]4.3765,[2]4.9302,[3]5.7950,[4]6.4520,[5]6.5518,[6]6.5306,[7]6.6998,[8]6.7882,[9]7.1546,[10]7.3953,[11]7.6318,[12]7.6658,[13]7.5747,[14]7.6587,[15]7.9104,[16]7.5129,[17]7.3892,[18]7.3527,[19]6.9798,[20]6.9674,[21]6.8726,[22]6.6872,[23]6.6463,[24]6.5580,[25]6.5584,[26]6.3886,[27]6.2075,[28]6.1053,[29]6.0200,[30]5.8647,[31]5.8386,[32]5.8566,[33]5.7941,[34]5.8282,[35]5.8526,[36]5.8961,[37]5.9029,[38]5.9178,[39]5.9545,[40]6.0124,[41]6.0197,[42]6.0541,[43]6.0122,[44]6.0675,[45]6.0701,[46]6.0438,[47]6.0666,[48]6.0354,[49]6.0414,[50]6.0023,[51]5.9984,[52]5.9875,[53]6.0309,[54]6.0133,[55]5.9911,[56]6.0240,[57]6.0455,[58]6.0671,[59]6.0803,[60]6.1247,[61]6.1146,[62]6.1776,[63]6.2109,[64]6.2261,[65]6.2713,[66]6.2799,[67]6.2971,[68]6.3109,[69]6.3362,[70]6.3690,[71]6.3895,[72]6.4198,[73]6.4849,[74]6.4895,[75]6.5025,[76]6.5167,[77]6.5289,[78]6.5140,[79]6.5421,[80]6.5340,[81]6.5476,[82]6.5512,[83]6.4990,[84]6.4838,[85]6.4722,[86]6.4504,[87]6.3854,[88]6.3571,[89]6.3370,[90]6.3217,[91]6.3466,[92]6.3422,[93]6.3448,[94]6.3418,[95]6.3706,[96]6.3687,[97]6.3623,[98]6.3560,[99]6.3420,[100]6.3425,[101]6.3685,[102]6.3614,[103]6.3825,[104]6.3886,[105]6.3881,[106]6.4049,[107]6.4026,[108]6.4156,[109]6.4094,[110]6.4046,[111]6.4266,[112]6.4455,[113]6.4468,[114]6.4436,[115]6.4513,[116]6.4439,[117]6.4491,[118]6.4783,[119]6.4997,[120]6.5353,[121]6.5514,[122]6.5767,[123]6.6158,[124]6.6331,[125]6.6241,[126]6.6633,[127]6.6996,[128]6.7275,[129]6.7114,[130]6.7209,[131]6.7152,[132]6.7066,[133]6.6939,[134]6.7044,[135]6.7006,[136]6.6885,[137]6.6804,[138]6.6637,[139]6.6526,[140]6.6492,[141]6.6204,[142]6.6161,[143]6.5886,[144]6.5692,[145]6.5603,[146]6.5471,[147]6.5544,[148]6.5564,[149]6.5499,[150]6.5458,[151]6.5472,[152]6.5370,[153]6.5199,[154]6.5106,[155]6.5175,[156]6.5124,[157]6.5304,[158]6.5340,[159]6.5381,[160]6.5404,[161]6.5526,[162]6.5227,[163]6.5111,[164]6.4853,[165]6.4534,[166]6.4254,[167]6.3884,[168]6.3559,[169]6.3427,[170]6.3306,[171]6.3019,[172]6.2840,[173]6.2657,[174]6.2357,[175]6.2140,[176]6.2038,[177]6.1827,[178]6.1598,[179]6.1428,[180]6.1340,[181]6.1119,[182]6.0930,[183]6.0791,[184]6.0789,[185]6.0717,[186]6.0734,[187]6.0791,[188]6.0753,[189]6.0939,[190]6.0949,[191]6.1156,[192]6.1319,[193]6.1494,[194]6.1610,[195]6.1818,[196]6.1981,[197]6.2201,[198]6.2356,[199]6.2390,[200]6.2437,[201]6.2393,[202]6.2599,[203]6.2666,[204]6.2665,[205]6.2775,[206]6.2852,[207]6.2814,[208]6.2895,[209]6.2944,[210]6.2998,[211]6.3097,[212]6.3169,[213]6.3276,[214]6.3313,[215]6.3351,[216]6.3505,[217]6.3684,[218]6.3818,[219]6.3821,[220]6.3785,[221]6.3725,[222]6.3694,[223]6.3587,[224]6.3516,[225]6.3472,[226]6.3684,[227]6.3769,[228]6.3829,[229]6.3891,[230]6.3850,[231]6.4020,[232]6.3888,[233]6.3716,[234]6.3558,[235]6.3401,[236]6.3328,[237]6.3219,[238]6.3252,[239]6.3091,[240]6.2985,[241]6.3015,[242]6.3051,[243]6.3035,[244]6.2918,[245]6.2893,[246]6.2772,[247]6.2648,[248]6.2582,[249]6.2559,[250]6.2602,[251]6.2529,[252]6.2495,[253]6.2396,[254]6.2354,[255]6.2239,[256]6.2048,[257]6.1935,[258]6.1852,[259]6.1833,[260]6.1753,[261]6.1711,[262]6.1654,[263]6.1610,[264]6.1418,[265]6.1411,[266]6.1397,[267]6.1329,[268]6.1421,[269]6.1399,[270]6.1406,[271]6.1484,[272]6.1524,[273]6.1519,[274]6.1535,[275]6.1624,[276]6.1677,[277]6.1837,[278]6.1943,[279]6.2029,[280]6.2061,[281]6.2156,[282]6.2212,[283]6.2356,[284]6.2435,[285]6.2525,[286]6.2670,[287]6.2665,[288]6.2726,[289]6.2635,[290]6.2482,[291]6.2327,[292]6.2175,[293]6.2039,[294]6.2063,[295]6.2057,[296]6.2100,[297]6.2086,[298]6.2113,[299]6.2083,[300]6.1971,[301]6.1970,[302]6.1892,[303]6.1815,[304]6.1736,[305]6.1713,[306]6.1582,[307]6.1603,[308]6.1640,[309]6.1478,[310]6.1417,[311]6.1355,[312]6.1377,[313]6.1324,[314]6.1310,[315]6.1147,[316]6.1102,[317]6.0938,[318]6.0723,[319]6.0841,[320]6.0970,[321]6.1008,[322]6.0964,[323]6.0899,[324]6.0876,[325]6.0974,[326]6.0974,[327]6.0995,[328]6.1037,[329]6.1096,[330]6.1124,[331]6.1245,[332]6.1216,[333]6.1287,[334]6.1230,[335]6.1165,[336]6.1202,[337]6.1173,[338]6.1161,[339]6.1103,[340]6.1059,[341]6.1138,[342]6.1163,[343]6.1218,[344]6.1220,[345]6.1220,[346]6.1191,[347]6.1236,[348]6.1276,[349]6.1296,[350]6.1262,[351]6.1269,[352]6.1268,[353]6.1215,[354]6.1214,[355]6.1267,[356]6.1296,[357]6.1259,[358]6.1347,[359]6.1376,[360]6.1340,[361]6.1334,[362]6.1401,[363]6.1513,[364]6.1575,[365]6.1632,[366]6.1639,[367]6.1725,[368]6.1701,[369]6.1710,[370]6.1723,[371]6.1665,[372]6.1715,[373]6.1770,[374]6.1756,[375]6.1751,[376]6.1832,[377]6.1783,[378]6.1807,[379]6.1868,[380]6.1785,[381]6.1744,[382]6.1690,[383]6.1679,[384]6.1672,[385]6.1661,[386]6.1656,[387]6.1649,[388]6.1604,[389]6.1550,[390]6.1481,[391]6.1402,[392]6.1363,[393]6.1348,[394]6.1373,[395]6.1356,[396]6.1281,[397]6.1356,[398]6.1396,[399]6.1478,[400]6.1473,[401]6.1489,[402]6.1495,[403]6.1512,[404]6.1575,[405]6.1482,[406]6.1450,[407]6.1442,[408]6.1454,[409]6.1576,[410]6.1686,[411]6.1807,[412]6.1968,[413]6.2082,[414]6.2157,[415]6.2208,[416]6.2287,[417]6.2415,[418]6.2451,[419]6.2525,[420]6.2611,[421]6.2730,[422]6.2785,[423]6.2854,[424]6.2974,[425]6.3062,[426]6.3127,[427]6.3172,[428]6.3255,[429]6.3302,[430]6.3390,[431]6.3533,[432]6.3575,[433]6.3564,[434]6.3518,[435]6.3524,[436]6.3550,[437]6.3643,[438]6.3721,[439]6.3688,[440]6.3678,[441]6.3629,[442]6.3618,[443]6.3631,[444]6.3634,[445]6.3615,[446]6.3636,[447]6.3664,[448]6.3708,[449]6.3681,[450]6.3685,[451]6.3642,[452]6.3527,[453]6.3442,[454]6.3381,[455]6.3390,[456]6.3436,[457]6.3454,[458]6.3432,[459]6.3439,[460]6.3525,[461]6.3498,[462]6.3482,[463]6.3532,[464]6.3522,[465]6.3493,[466]6.3414,[467]6.3418,[468]6.3417,[469]6.3439,[470]6.3444,[471]6.3394,[472]6.3440,[473]6.3383,[474]6.3398,[475]6.3339,[476]6.3358,[477]6.3287,[478]6.3279,[479]6.3343,[480]6.3393,[481]6.3412,[482]6.3368,[483]6.3326,[484]6.3347,[485]6.3333,[486]6.3277,[487]6.3278,[488]6.3259,[489]6.3210,[490]6.3183,[491]6.3150,[492]6.3091,[493]6.3062,[494]6.3046,[495]6.3042,[496]6.3009,[497]6.2954,[498]6.2935,[499]6.2886,[500]6.2792,[501]6.2724,[502]6.2724,[503]6.2718,[504]6.2627,[505]6.2654,[506]6.2664,[507]6.2605,[508]6.2565,[509]6.2556,[510]6.2596,[511]6.2640,[512]6.2675,[513]6.2692,[514]6.2757,[515]6.2703,[516]6.2695,[517]6.2707,[518]6.2707,[519]6.2737,[520]6.2765,[521]6.2781,[522]6.2811,[523]6.2820,[524]6.2880,[525]6.2915,[526]6.2926,[527]6.2947,[528]6.2896,[529]6.2901,[530]6.2853,[531]6.2843,[532]6.2891,[533]6.2914,[534]6.2897,[535]6.2921,[536]6.2866,[537]6.2843,[538]6.2892,[539]6.2905,[540]6.2944,[541]6.2952,[542]6.2960,[543]6.2974,[544]6.2985,[545]6.2965,[546]6.2971,[547]6.2926,[548]6.2873,[549]6.2873,[550]6.2846,[551]6.2809,[552]6.2788,[553]6.2747,[554]6.2723,[555]6.2694,[556]6.2692,[557]6.2716,[558]6.2676,[559]6.2670,[560]6.2665,[561]6.2665,[562]6.2643,[563]6.2642,[564]6.2685,[565]6.2701,[566]6.2698,[567]6.2677,[568]6.2682,[569]6.2666,[570]6.2691,[571]6.2696,[572]6.2705,[573]6.2706,[574]6.2669,[575]6.2666,[576]6.2664,[577]6.2653,[578]6.2632,[579]6.2640,[580]6.2574,[581]6.2536,[582]6.2526,[583]6.2532,[584]6.2536,[585]6.2459,[586]6.2392,[587]6.2395,[588]6.2444,[589]6.2501,[590]6.2531,[591]6.2551,[592]6.2537,[593]6.2502,[594]6.2512,[595]6.2489,[596]6.2526,[597]6.2504,[598]6.2467,[599]6.2487,[600]6.2483,[601]6.2470,[602]6.2487,[603]6.2517,[604]6.2526,[605]6.2558,[606]6.2577,[607]6.2562,[608]6.2528,[609]6.2533,[610]6.2569,[611]6.2550,[612]6.2575,[613]6.2538,[614]6.2488,[615]6.2412,[616]6.2441,[617]6.2380,[618]6.2330,[619]6.2275,[620]6.2134,[621]6.2062,[622]6.2044,[623]6.2060,[624]6.2063,[625]6.2062,[626]6.2048,[627]6.2070,[628]6.2075,[629]6.2071,[630]6.2104,[631]6.2166,[632]6.2220,[633]6.2204,[634]6.2239,[635]6.2245,[636]6.2214,[637]6.2181,[638]6.2207,[639]6.2178,[640]6.2188,[641]6.2190,[642]6.2258,[643]6.2280,[644]6.2291,[645]6.2272,[646]6.2315,[647]6.2276,[648]6.2283,[649]6.2283,[650]6.2321,[651]6.2379,[652]6.2386,[653]6.2427,[654]6.2362,[655]6.2357,

llama_print_timings:        load time = 16073.04 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 4977968.90 ms / 335360 tokens (   14.84 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 5012024.06 ms

real	83m32.235s
user	126m43.253s
sys	4m15.665s

Perplexity delta: -0.0540

M1 Pro results

tok_embd output Perplexity (with BLAS) Delta Size (MB)
Q4_0 Q4_0 6.2897 0.0000 3.9G
Q4_0 F16 6.2355 -0.0542 4.1G
F16 Q4_0 TBD TBD 4.1G
F16 F16 6.2357 -0.0540 4.3G

ggerganov avatar Apr 15 '23 19:04 ggerganov

Leaving 1 or 2 layers fully uncompressed has very good results in the literature, from what I recall. This is a good idea to test.

MarkSchmidty avatar Apr 15 '23 19:04 MarkSchmidty

Looks like the quantization of the output tensor has more significant impact on the quality compared to the tok_embeddings tensor (which is of the same size, but at the start of the transformer). It might be worth keeping output in F16 format.

ggerganov avatar Apr 16 '23 20:04 ggerganov