compilade comments

Results 108 comments of


                                            compilade

llama : add RWKV models support

> RWKV should reconsider to implement on llama cpp given recent merge of MAMBA SSM. If nobody else does it, I'll have time to work on RWKV in `llama.cpp` starting...

llama : add RWKV models support

> I'm hitting some issues with the vk cache initialization The KV cache for recurrent models is sized from the GGUF metadata keys `{model}.ssm.state_size`, `{model}.ssm.inner_size`, and `{model}.ssm.kernel_size`. These get read...

Qwen-72B-Chat conversion script does not treat <|im_start|> and <|im_end|> correctly.

> in my case, [PAD151645] [PAD151644] [PAD151643] shows in output(Qwen-14B-Chat q4k) Regarding the `[PAD{id}]` tokens, I recently fixed this in #5052, but it requires re-converting existing Qwen models. > Solution:...

Qwen-72B-Chat conversion script does not treat <|im_start|> and <|im_end|> correctly.

> I originally opened this issue, not so much that had a big interest in getting it fixed for myself, but because I thought community might want to aware that...

Suggestion: RWKV Language Model

@saharNooby > Do PyTorch and ggml store values differently? I also noticed that in llama.cpp, when converting the model, dimensions are reversed, but data is left untouched -- looks related....

llama : rename n_ctx to kv_size

I do not agree with this change (but I like the underlying intention of making `llama.cpp` less confusing). As I'm working on supporting Mamba in `llama.cpp` (see #5328), I'd like...

llama : support Mamba Selective State Space Models

> @compilade ust out of curiosity, is any convolution operation performed? I see some tensors with the name `conv`, but I never see `ggml_conv_1d` or `ggml_conv_2d` being used at any...

llama : support Mamba Selective State Space Models

> Regarding the KV questions: > IIUC one slot is needed per sequence, so in that sense the KV cache size could be interpreted as the maximum number of distinct...

llama : support Mamba Selective State Space Models

I've been thinking about what parts of the KV cache API can and cannot be supported for Mamba. In general, functions which operate on whole sequences or the whole KV...

llama : support Mamba Selective State Space Models

Now that multiple sequences can be processed at once, I've been trying to make the `server` example work with Mamba. >>I think that most of what is currently done with...