aikitoria
aikitoria
Why is this integration AMX exclusive? The original KTransformers library was not.
This was also recently implemented by sglang: https://lmsys.org/blog/2025-07-14-intel-xeon-optimization/#multi-numa-parallelism By splitting the weights between NUMA nodes, and then doing tensor parallel between those nodes, bandwidth utilization of CPU inference can be...
> wonder what proper numa would accomplish Let's use dual socket Eypc Turin as an example. When all 24 channels are filled, it will have a total memory bandwidth of...
I have failed to search apparently. Dupe of https://github.com/ikawrakow/ik_llama.cpp/issues/627
I see, will reopen then.