klosax
klosax
Why not go even further? Make the common infrastructure of llama.cpp become something like "ggml-llm" and the code for the specific llm architectures (llama, gpt-2, gpt-j, mpt and others) become...
The gguf [gpt2 tokenizer](https://github.com/ggerganov/llama.cpp/blob/gguf/cmpnct_gpt2bpe.hpp) also have a Trie implementation. The tokenizer is on MIT license. Maybe it could be reused for the llama tokenizer.
The author of the gpt2 tokenizer gave permission to use it and stated it is on MIT license here https://github.com/ggerganov/llama.cpp/pull/2398#issuecomment-1667009979
It looks like the MIT and Apache licenses are compatible, but a copy of the Apache license and a Notice file must be included: https://softwareengineering.stackexchange.com/questions/51987/how-to-include-an-apache-library-with-my-opensource-code#52223
What is the difference between `max_seq_len` and `context_length`? Isn't both the maximum usable/recommended context length?
I suggest use of special key-values to identify special tokens: `tokenizer.bos_token_id` Beginning of sequence marker `tokenizer.eos_token_id` End of sequence marker `tokenizer.unk_token_id` Unknown token `tokenizer.sep_token_id` Separator token `tokenizer.pad_token_id` Padding token
Why not use a less cryptic key naming? `[llm].hidden_size --> [llm].embedding_length` `[llm].n_ff --> [llm].feedforward_length` `[llm].n_layers --> [llm].num_layers` `[llm].attention.n_heads --> [llm].attention.num_heads` `[llm].rope.n_dims --> [llm].rope.num_dims` or even better change `n_` and `num_`...
I tend to prefer `_count` instead of `num_` as in `gguf_header_t`: ``` uint32_t tensor_count; uint32_t metadata_kv_count; ``` ``` gguf_tensor_info_t: uint32_t n_dimensions; --> uint32 dimension_count; uint32_t dimensions[n_dimensions]; --> uint32 dimensions[dimension_count]; uint32_t...
More descriptive: `[llm].rope.scale` --> `[llm].rope.context_scale`
> Luckily, @klosax already [did these for v1 of the spec](https://github.com/klosax/ggml/tree/gguf/examples/gguf)! Hopefully, we can just update this code and we should be good to go. I think this should be...