Mack Straight
Mack Straight
doh, thanks for pointing that out, I've only been using fp16 =) will fix.
The handling of UTF-8 here is exactly the same as SentencePiece does. Multi-byte characters that don't form tokens will be output one byte at a time.
"why not both?" - changed file magic so existing unversioned files don't misparse (ggml -> ggmf "gg model file") - now a version number in the header
the token vector should prob be a struct now which also includes the score (see https://github.com/ggerganov/llama.cpp/commit/074bea2eb1f1349a0118239c4152914aecaa1be4)
this is just out of bounds write to memory_k/memory_v when n_past goes past the end, ya? if you add this assert to ggml_view_1d ` GGML_ASSERT((ne0 * GGML_TYPE_SIZE[a->type])/GGML_BLCK_SIZE[a->type]
> This looks very reasonable. The question is why we don't see a problem with llama but do with alpaca... nah it's reproducible with any model. the key difference is...
Can you try this convert script? https://gist.github.com/eiz/828bddec6162a023114ce19146cb2b82 (it outputs .tmp files, you can uncomment the os.rename to do it in place if you want but I didn't want to overwrite...
If you don't have access to the original LLaMA files I think someone uploaded it here https://huggingface.co/decapoda-research/llama-7b-hf/blob/main/tokenizer.model
the tokenizer.model contains scores for each token, most of which are just the negation of the token index (since they're output by the bpe trainer in descending order) so I...