compilade comments

Results 108 comments of


                                            compilade

llama : initial Mamba-2 support

> It sounds like having a simple fallback of expected filenames would be a reasonable thing to include here? I don't know that we want to maintain a ton of...

llama : initial Mamba-2 support

@Tangshengku Bi-Mamba seems amazing! > The ppl is pretty bad with more than 3500+. So, have you ever tested the performance of your implementation before? I did test it when...

llama : initial Mamba-2 support

> However, I first tried to use mamba2-2.7 model and computed the ppl on wiki dataset @Tangshengku Which model exactly is causing you problems? I can't reproduce the problem with...

llama : initial Mamba-2 support

@EthanFS I don't think these small Mamba(1 and 2) models are instruction-trained, and so I wouldn't expect them to ever really "finish" their output (although there *are* cases where they...

llama : initial Mamba-2 support

> Instead of computing the w_scale and w_bias during tensor transformation, I compute the w_scale and w_bias during inference on the activation, which is equivalent to the operation on the...

llama : initial Mamba-2 support

> BTW, what do you mean about 'TQ1_0 and TQ2_0' are not good to this model? You mean the ppl will be bad or the speed &memory will be bad?...

llama : initial Mamba-2 support

There is a problem with multi-user (and/or parallel sequence) inference for recurrent models (also on `master`, so might have inherited the problem by merging the latest changes). I'll try to...

llama : initial Mamba-2 support

> but there's also something else which makes it seem like recurrent states of sequences are not properly isolated I found the problem! It was introduced in #12181 https://github.com/ggml-org/llama.cpp/blob/791998b42d6cd6edb31e4d5824e29c100cecd40b/src/llama-graph.cpp#L287-L291 The...

llama : initial Mamba-2 support

@gabe-l-hart I've been attempting to adapt the CUDA implementation of the `SSM_SCAN` operator to how it's modified for Mamba-2 (some shape changes and an extra input tensor for the state...

Error: llama_model_load: error loading model: failed to open ggml-bagel-2.8b-v0.2-q8_0.gguf

@vineel96 You do not need to pull #5328, since it has been merged a while ago in the `master` branch. This means you can use the latest version of `llama.cpp`,...