llama.cpp
llama.cpp copied to clipboard
Commit c9f670a (Implement non-greedy tokenizer that tries to maximize token lengths) breaks llama?
Old version:
.\build\Release\llama.exe -m C:\...\models\30B\ggml-model-q4_0.bin -t 10 -n 256 --seed 100 --temp 0.2 -p "list all US states in alphabetical order:"
output: Alabama, Alaska, Arizona, Arkansas, California, Colorado, Connecticut, Delaware Florida Georgia Hawaii Idaho Illinois Indiana Iowa Kansas Kentucky Louisiana Maine Maryland Massachusetts Michigan Minnesota Mississippi Missouri Montana Nebraska Nevada New Hampshire New Jersey New Mexico New York North Carolina North Dakota Ohio Oklahoma Oregon Pennsylvania Rhode Island South Carolina Tennessee Texas Utah Vermont Virginia Washington West Virginia Wisconsin Wyoming ... (keeps repeating)
.\build\Release\llama.exe -m C:\...\models\30B\ggml-model-q4_0.bin -t 10 -n 256 --seed 200 --temp 0.2 -p "list all US states in alphabetical order:"
output: Alabama, Alaska, Arizona, Arkansas, California, Colorado, Connecticut, Delaware Florida Georgia Hawaii Idaho Illinois Indiana Iowa Kansas Kentucky Louisiana Maine Maryland Massachusetts Michigan Minnesota Mississippi Missouri Montana Nebraska Nevada New Hampshire New Jersey New Mexico New York North Carolina North Dakota Ohio Oklahoma Oregon Pennsylvania Rhode Island South Carolina Tennessee Texas Utah Vermont Virginia Washington West Virginia Wisconsin Wyoming
list all US states in alphabetical order [end of text]
.\build\Release\llama.exe -m C:\...\models\30B\ggml-model-q4_0.bin -t 10 -n 256 --seed 300 --temp 0.2 -p "list all US states in alphabetical order:"
output: Alabama, Alaska, Arizona, Arkansas, California, Colorado, Connecticut, Delaware, Florida, Georgia, Hawaii, Idaho, Illinois, Indiana, Iowa, Kansas, Kentucky, Louisiana, Maine, Maryland, Massachusetts, Michigan, Minnesota, Mississippi, Missouri, Montana, Nebraska, Nevada, New Hampshire, New Jersey, New Mexico, New York, North Carolina, North Dakota, Ohio, Oklahoma, Oregon, Pennsylvania, Rhode Island, South Carolina, South Dakota, Tennessee, Texas, Utah, Vermont, Virginia, Washington, West Virginia, Wisconsin and Wyoming. ... (keeps repeating)
new release (after commit c9f670a):
.\llama.exe -m C:\...\models\30B\ggml-model-q4_0.bin -t 10 -n 256 --seed 100 --temp 0.2 -p "list all US states in alphabetical order:"
output: list the 50 state capitals (in no particular order): [end of text]
.\llama.exe -m C:\...\models\30B\ggml-model-q4_0.bin -t 10 -n 256 --seed 200 --temp 0.2 -p "list all US states in alphabetical order:"
output: list the 50 state capitals and their abbreviations (e.g., Sacramento, CA): [end of text]
.\llama.exe -m C:\...\models\30B\ggml-model-q4_0.bin -t 10 -n 256 --seed 200 --temp 0.2 -p "list all US states in alphabetical order:"
output: list the 50 largest cities of USA by population (2017): [end of text]
Could you please check how it behaves with the BPE tokenizer which is not yet merged? https://github.com/ggerganov/llama.cpp/pull/252
Could you also copy here the tokens that were generated for the "list all US states..." prompt in the current version (they are printed when llama starts)?
list of tokens and output:
main: prompt: ' list all US states in alphabetical order:' main: number of tokens in prompt = 10 1 -> '' 1051 -> ' list' 599 -> ' all' 3148 -> ' US' 5922 -> ' states' 297 -> ' in' 22968 -> ' alphabet' 936 -> 'ical' 1797 -> ' order' 29901 -> ':'
list the 50 state capitals (in no particular order): [end of text]
The version you linked complains that my model files are too old: (too old, regenerate your model files!)
After remaking model files (converting from pth and quantizing) it still doesn't work right:
.\build\Release\llama.exe -m .\models\30B\ggml-model-q4_0.bin -t 10 -n 256 --seed 100 --temp 0.2 -p " list all US states in alphabetical order:"
main: prompt: ' list all US states in alphabetical order:' main: number of tokens in prompt = 11 1 -> '' 29871 -> ' ' 1051 -> ' list' 599 -> ' all' 3148 -> ' US' 5922 -> ' states' 297 -> ' in' 22968 -> ' alphabet' 936 -> 'ical' 1797 -> ' order' 29901 -> ':'
sampling parameters: temp = 0.200000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.300000
list all US states in alphabetical order: for i, state_name in enumerate(state.abbr): print(i + 1) """ return [x[0] if isinstance(x, tuple) else x for _, x in sorted(list)] [end of text]
That's interesting, I'm getting really different results with -t 8 and -t 10 on M1 Pro laptop / 7B model.
system_info: n_threads = 10 / 10 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 |
list all US states in alphabetical order:
Arizona, Arkansas, California (the Golden State), Colorado, Connecticut, Delaware, District of Columbia, Florida, Georgia, Hawaii, Idaho, Illinois, Indiana, Iowa, Kansas, Kentucky, Louisiana, Maine, Maryland, Massachusetts, Michigan, Minnesota, Mississippi, Missouri, Montana, Nebraska, Nevada, New Hampshire (the Granite State), New Jersey, New Mexico, North Carolina, Ohio, Oklahoma, Oregon, Pennsylvania, Rhode Island, South Dakota, Tennessee, Texas, Utah, Vermont, Virginia, Washington DC, West Virgina and Wyoming.
The 50 states of the United States are listed in alphabetical order: Alabama Alaska Arizona Arkansas California Colorado Connecticut Delaware District Of Columbia Florida Georgia Hawaii Idaho Illinois Indiana Iowa Kansas Kentucky Louisiana Maine Maryland Massachusetts Michigan Minnesota Mississippi Missouri Montana Nebraska Nevada New Hampshire (the Granite State)New Jersey New Mexico North Carolina Ohio Oklahoma Oregon Pennsylvania Rhode Island South Dakota Tennessee Texas Utah Vermont Virginia Washington DC West Virgina Wisconsin Wyoming
The 50 states of the United States are listed in alphabetical order: Alabama Alaska Arizona Arkansas California Colorado Connecticut Delaware District Of Columbia
main: predict time = 3814.46 ms / 51.55 ms per token
system_info: n_threads = 8 / 10 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 |
list all US states in alphabetical order:
http://en.wikipedia.org/wiki/List_of...
I'm not sure what you mean by "in the middle of" a state, but if it means that they are surrounded on both sides (and no other states) then I think there is only one such case - Hawaii! [end of text]
main: predict time = 47638.37 ms / 179.77 ms per token
Tried with different thread count and it seems this affect not only performance but the core inference quality. Looks like choose 1, 4, 8 threads are safe on my machine.
Wow you're right. In my case it answers correctly with 4 threads but not with 8 or 10. Same prompt, same seed the only difference is the number of threads.
Number of threads affects the output due to floating point rounding, this is known: #95
After more testing I think we can close this one. The new version either matches or outperforms the old one in most tasks. The number of threads affecting output is still a problem but that wasn't caused by the commit.