compilade

Results 109 comments of compilade

I've fixed the pooled embeddings problem with Mamba in by making it only process a single sequence per `ubatch`. When the sequences are short, this is slightly slower than processing...

> Not sure if I'm understanding the comment correctly @jukofyork, but the logic I'm using to identify the most influential tensors/layers is to simply average the importance scores (IS) for...

I'd like it very much if they released a smaller version of their model. I don't have enough RAM to run Mixtral (only have 8GB), and Jamba seems to be...

> Any update on Jamba support? I've worked on refactoring the KV cache in the past weeks to allow managing both recurrent states and Attention's KV cache at once. (See...

> For your endeavors, could I 'Buy You a Coffee' to help support? @severian42 I appreciate the offer (it means a lot!), but I can't accept for now. Receiving international...

Okay, turns out I only had to put like, 2 to 3 more days of work on this and BAM **it works**. As of today, in [branch `refactor-kv-cache`](), using the...

There is still more work I need to put into this. I've got inference working, but things that are not yet done are: - state saving and reloading to and...

> how can they work if the issue is not complete? @ELigoP Well, technically the layout of the GGUF files doesn't really need to be changed further for Jamba support,...

> They adopt a channel wise scaling factor compared to the tensor level ones. Maybe a separate kennel can be built to apply scales outside of the matmul kernels? Hmm,...