Azure comments

Results 22 comments of


                                            Azure

first commit, modify transformer to add cache attention calculation, …

> @Azure-Tang hi tang, Thanks for your work of the implementation of terapipe on Megatron-LM. Did you try to compare the performance versus no terapipinig? How much benefit it brings?...

Fix error in pipeline-parallel schedules to support context parallel

why this merging is blocked？

Why the linear input for Layer.pack "must be of type `torch.half`"?

hi, I notice that the core mma function for bf16 is supported by vllm's gptq marlin. It seems that a few changes can do this feature. https://github.com/vllm-project/vllm/blob/main/csrc/quantization/gptq_marlin/gptq_marlin.cu#L89 I really need...

Why the linear input for Layer.pack "must be of type `torch.half`"?

> @Azure-Tang Is bf16 support done? Have you made a PR elsewhere? Didnt donw yet, maybe next week?

Add support for kimi-k2-0905 [Tracking]

> Is there any updates on the AMX int4 progress? You may find it on SOSP branch, which will be merged after being fully tested.

支持国产GPU/华为昇腾NPU需要做哪些工作？

> 我们意图在华为Ascend 910B上进行Deepseek-R1 671B的推理，请问适配新的硬件需要做哪些工作？我们目前没有昇腾的设备和懂昇腾硬件的同学，所以可能没办法支持您做这个。您可以看一下marlin算子是否可以在昇腾卡上运算？ https://github.com/IST-DASLab/marlin

支持国产GPU/华为昇腾NPU需要做哪些工作？

> 我这有，怎么联系？wx群满了非常感谢您🙏不过这个支持应该不会在短期计划中。如果想加群的话我们更新了主页的二维码

支持国产GPU/华为昇腾NPU需要做哪些工作？

> > > 我们意图在华为Ascend 910B上进行Deepseek-R1 671B的推理，请问适配新的硬件需要做哪些工作？ > > > > > > 我们目前没有昇腾的设备和懂昇腾硬件的同学，所以可能没办法支持您做这个。您可以看一下marlin算子是否可以在昇腾卡上运算？ https://github.com/IST-DASLab/marlin > > 所以 KTransformer 和硬件之间的接口就只是 Marlin 的矩阵乘算子吗？如果要适配新的硬件，是不是把特定硬件实现的 FP16xINT4 的矩阵乘算子接入 KTransformer 就可以了呢？看上去是的，您可以仿造我们的linearMarlin算子（在linear.py里）构造一个新的算子，然后可能需要注意一下load权重部分，我们是将gguf dequant后再用marlin pack成marlin的权重。这部分如果有不懂的欢迎交流

[Feature] Add support for OpenAI's new open-source models gpt-oss-120b and gpt-oss-20b

Supporting gpt-oss is a bit complicated and needs scheduling, orz.

[Bug] can't deepseek 0528 version

> [@Azure-Tang](https://github.com/Azure-Tang) can you help check ? thank you very much Hi, I think you are using `fp8` yaml, which needs to load special weights. For using IQ1s, you need...