aikitoria

Results 65 comments of aikitoria

Why is this integration AMX exclusive? The original KTransformers library was not.

This was also recently implemented by sglang: https://lmsys.org/blog/2025-07-14-intel-xeon-optimization/#multi-numa-parallelism By splitting the weights between NUMA nodes, and then doing tensor parallel between those nodes, bandwidth utilization of CPU inference can be...

> wonder what proper numa would accomplish Let's use dual socket Eypc Turin as an example. When all 24 channels are filled, it will have a total memory bandwidth of...

I have failed to search apparently. Dupe of https://github.com/ikawrakow/ik_llama.cpp/issues/627