Dev
Dev
Is there any updates on the AMX int4 progress?
I will take a look and looking forward to it!
Pulled the SOSP branch and ran some quick tests with DeepseekR1 with AMXInt4. Loading takes quite a while ( 3 hours ) as it converts the quant during loading from...
Little update on this. I ran the AMXInt4 with multi-gpu and I was able to load an even larger context. At 40K context, prefill speed got up to 141T/s (...
Thanks @ovowei. I hope to have time this weekend to quantize the weights and will upload them if no one gets to it first.
I pulled the AMXInt4 working fork of Ktransformers and saw some meaningful improvements. Would love to see these be implemented in ik_llama too https://github.com/kvcache-ai/ktransformers/issues/1492#issuecomment-3281024307 > I have a fork of...
Hi, I tried your fork and the uplift wasn't noticeable. prompt eval time = 1016.33 ms / 16 tokens ( 63.52 ms per token, 15.74 tokens per second) eval time...
Long Context over ~80K seems to cause instabilities, but until then I have had positive experiences with this package. I am very much looking forward to the AMXInt4 backend as...
I get it. I've followed your adventure through this package and tend to agree. Just speaking on behalf of my experience. Edit: I run ik_llama and @Gadflyii llama.cpp AMX fork...
I can only speak on my own experience as I don't have the hardware to test their best case runs, but with a 24C W7-3455 with 512GB DDR5 (4800) and...