compilade
compilade
For the first time state saving and reloading works for Jamba (both for the whole state and single-sequences). 🎉 This is implemented in > I'm thinking that the changes for...
> I suspect your implementation of `llama_rs_cache` is a much better approach than the one I took of simply creating a duplicate `llama_kv_cache` and conditionally making the two caches have...
> I'm now able to run a lightweight mamba2 model (details below). @gabe-l-hart Amazing! I've also merged from latest `master` (into ), and some parts differ, but most is similar...
Thanks for finding this and fixing it. There has been many refactors lately where the old `convert_llama_ggml_to_gguf.py` was not tested at all. (mostly because I don't have old GGML models...
> Is the [failed CI check](https://github.com/ggerganov/llama.cpp/actions/runs/10333572638/job/28606150773?pr=8928) required for merging this PR, do I need to do anything about it? as it does not seem to be related to this PR....
@dlippold Note that the model referred here is not `Mamba-Codestral-7B-v0.1`, but `Codestral-22B-v0.1`. Implementing support for `Mamba-Codestral-7B-v0.1` will not affect the performance of `Codestral-22B-v0.1`, because they use totally different architectures (Mamba-2...
@kaetemi Defragmenting when it fails should be good enough, and should be fast enough (I think). `llama_kv_cache_defrag` should do the right thing, but only at the next `llama_kv_cache_update` or `llama_decode`....
For models like MiniCPM-V-2.5, should their `Model` subclass instead simply hardcode and override `get_vocab_base_pre` to the desired pre-tokenizer? Otherwise the user needs to know the specific incantation required, and could...
> @compilade Is it ok to merge this? @Galunid I'm not sure, since this exposes a way to easily make invalid model files without any warning. > I meant this...
> `llm_load_print_meta: n_vocab = 92550` > `INFO:hf-to-gguf:output.weight, torch.bfloat16 --> F16, shape = {4096, 92544}` @Sakura4036 The vocab size does not match the tensor size. Try to modify the `vocab_size` field...