Dev

Results 14 comments of Dev

Is there any updates on the AMX int4 progress?

I will take a look and looking forward to it!

Pulled the SOSP branch and ran some quick tests with DeepseekR1 with AMXInt4. Loading takes quite a while ( 3 hours ) as it converts the quant during loading from...

Little update on this. I ran the AMXInt4 with multi-gpu and I was able to load an even larger context. At 40K context, prefill speed got up to 141T/s (...

Thanks @ovowei. I hope to have time this weekend to quantize the weights and will upload them if no one gets to it first.

I pulled the AMXInt4 working fork of Ktransformers and saw some meaningful improvements. Would love to see these be implemented in ik_llama too https://github.com/kvcache-ai/ktransformers/issues/1492#issuecomment-3281024307 > I have a fork of...

Hi, I tried your fork and the uplift wasn't noticeable. prompt eval time = 1016.33 ms / 16 tokens ( 63.52 ms per token, 15.74 tokens per second) eval time...

Long Context over ~80K seems to cause instabilities, but until then I have had positive experiences with this package. I am very much looking forward to the AMXInt4 backend as...

I get it. I've followed your adventure through this package and tend to agree. Just speaking on behalf of my experience. Edit: I run ik_llama and @Gadflyii llama.cpp AMX fork...

I can only speak on my own experience as I don't have the hardware to test their best case runs, but with a 24C W7-3455 with 512GB DDR5 (4800) and...