llama.cpp
llama.cpp copied to clipboard
alaways "failed to tokenize string! "
failed to tokenize string!
system_info: n_threads = 16 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | failed to tokenize string!
main: prompt: ' china' main: number of tokens in prompt = 1 1 -> ''
sampling parameters: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.300000
曲ー! /S部ュース / KSHErsLAheLUE - THE NEW CH`,MEgeERSION IS HERE@ÿThis entry was вер in news on JuneSASSSASS8 by adminS [end of text]
Can you provide the command line and a checksum of the model file?
same problem, ggml-model-q4_0.bin, md5sum is 919e4f8aee6ce4f3fbabb6cbcd7756db
Can you provide the command line and a checksum of the model file?
./main -m ./models/7B/ggml-model-q4_0.bin -p "china" -n 512
checksum: md5sum ggml-model-q4_0.bin 919e4f8aee6ce4f3fbabb6cbcd7756db ggml-model-q4_0.bin 6efc8dab194ab59e49cd24be5574d85e consolidated.00.pth
The files look good, though these are the "old" format, you'll have to regenerate them if you update to latest master.
There should be three tokens recognized with the old tokenizer:
main: prompt: ' china'
main: number of tokens in prompt = 3
1 -> ''
18558 -> ' chi'
1056 -> 'na'
The new tokenizer gives different tokens:
main: prompt: ' china'
main: number of tokens in prompt = 3
1 -> ''
521 -> ' ch'
1099 -> 'ina'
I really can't explain this, unless you have some strange terminal encoding set?
encoding is LANG=en_US.UTF-8
The files look good, though these are the "old" format, you'll have to regenerate them if you update to latest master.
There should be three tokens recognized with the old tokenizer:
main: prompt: ' china' main: number of tokens in prompt = 3 1 -> '' 18558 -> ' chi' 1056 -> 'na'
The new tokenizer gives different tokens:
main: prompt: ' china' main: number of tokens in prompt = 3 1 -> '' 521 -> ' ch' 1099 -> 'ina'
I really can't explain this, unless you have some strange terminal encoding set?
Thank you very much, it is available after I upgraded python version to 3.9 and pulled the latest master code and redeployed it。
Possibly a duplicate of #113.