compilade
compilade
Okay, I think this is finally ready for review. Pretty much everything works (on CPU, at least), and I've updated the first post with an "Out of scope for this...
> Are you familiar with RWKV? I've read about it, but I'm not as familiar with RWKV as I'd like, unfortunately. :sweat_smile: > I'm wondering how well the proposed changes...
> Will be reviewing this PR in the following days, thanks Since `master` continues to change (and that's good), I hope it's okay if I resolve conflicts with ~~`git rebase...
> The implementation is pretty good Thanks! > I'm still not convinced we need to introduce `n_parallel` and `llama_n_max_seq()`. Imagine the following case: A user wants to use Mamba 3B...
Since the `transformers` library is getting support for Mamba (https://github.com/huggingface/transformers/pull/28094), the official Mamba models have been re-released with more metadata. See https://huggingface.co/collections/state-spaces/transformers-compatible-mamba-65e7b40ab87e5297e45ae406 I think I should rename the GGUF key-value...
> We should actually start using `llama_n_max_seq()` instead of `n_ctx` to init batches in the examples to make it more semantically clear. There might be a misunderstanding here. To be...
> Sorry if this is stupid, but when we call llama_batch_init(n_ctx, 0, params.n_parallel) don't we create n_parallel batches of size n_ctx when it should be of size n_ctx_slot? If I...
> I'd really love to see this merged, is there anything that needs to be done before that happens? @ddh0 Well, I can name a few: - Resolve the conflicts...
> ``` > INFO VOCABFILE: './models/ggml-vocab-deepseek-llm.gguf' > ERROR detokenize=True id=100002 expected='�' result='ø' > ERROR detokenize=True id=100003 expected='�' result='ö' > ERROR detokenize=True id=100004 expected='�' result='ú' > ERROR detokenize=True id=100005 expected='�' result='ÿ'...
Nice! This should also help fix (at least part of) [Falcon's tokenization](https://huggingface.co/tiiuae/falcon-7b/blob/main/tokenizer.json), because the `Punctuation` pre-tokenizer type uses the `Po` category and not the broader `P` one. (ref: , which...