compilade comments

Results 109 comments of


                                            compilade

imatrix : use GGUF to store importance matrices

> Re using bigger batch sizes - does this mean if memory allows, imatrix should be in fact faster to process via PP? @danielhanchen Currently, with `llama-imatrix` from the `master`...

imatrix : use GGUF to store importance matrices

To address some feedback I got recently, I've added a warning when writing using the legacy format so that it's more obvious what is happening. ``` save_imatrix: saving to legacy...

imatrix : use GGUF to store importance matrices

> @compilade Time to merge this (and adapt #12718 afterwards)? @CISC Sure. I hope I've tested enough edge cases. Will merge at 16:00 UTC on 2025-07-19 (in around 10 hours),...

llama : support Jamba hybrid Transformer-Mamba models

I'm currently working on a big refactor of how Mamba (and Jamba) works to make all sequences of a sub-batch be of the same length (initially only for models with...

llama : support Jamba hybrid Transformer-Mamba models

I've pushed the refactor to use equal-sequence-length sub-batch splitting for recurrent models. This greatly simplifies the SSM operations, no need for `inp_s_seq` anymore. And recurrent state slot allocation is now...

llama : support Jamba hybrid Transformer-Mamba models

> @compilade Do you have local changes in this branch? Would like to merge latest `master` here @ggerganov I do have local changes, which I've pushed now. I was in...

llama : support Jamba hybrid Transformer-Mamba models

> The change is quite big and I'm having a bit of trouble to merge it all at once. Wonder if we should take a more step-by-step approach. I agree...

llama : support Jamba hybrid Transformer-Mamba models

Now that variable GQA support is in `master` (because of #7359 which has been merged), I plan to separate the advanced batch splits feature in its own PR for easier...

llama : support Jamba hybrid Transformer-Mamba models

> Any updates on this since Jamba 1.5 is now out? @Autumnlight02 Basically, since was merged, now I need to resolve a *very big* merge conflict because I didn't keep...

llama : support Jamba hybrid Transformer-Mamba models

Some progress update on Jamba: I began resolving the merge conflicts, and there were at least 2000+ lines of conflicts (basically half of this PR). This is manageable. While I've...