aikitoria comments

Repositories
Issues
Comments

Results 65 comments of


                                            aikitoria

[Feature] KTransformers Integration to Support CPU/GPU Hybrid Inference for MoE Models

Why is this integration AMX exclusive? The original KTransformers library was not.

Feature Request: Tensor Parallelism

This was also recently implemented by sglang: https://lmsys.org/blog/2025-07-14-intel-xeon-optimization/#multi-numa-parallelism By splitting the weights between NUMA nodes, and then doing tensor parallel between those nodes, bandwidth utilization of CPU inference can be...

Feature Request: Tensor Parallelism

> wonder what proper numa would accomplish Let's use dual socket Eypc Turin as an example. When all 24 channels are filled, it will have a total memory bandwidth of...

Feature Request: Multi NUMA Tensor Parallel

I have failed to search apparently. Dupe of https://github.com/ikawrakow/ik_llama.cpp/issues/627

Feature Request: Multi NUMA Tensor Parallel

I see, will reopen then.