Azure
Azure
> @Azure-Tang hi tang, Thanks for your work of the implementation of terapipe on Megatron-LM. Did you try to compare the performance versus no terapipinig? How much benefit it brings?...
why this merging is blocked?
hi, I notice that the core mma function for bf16 is supported by vllm's gptq marlin. It seems that a few changes can do this feature. https://github.com/vllm-project/vllm/blob/main/csrc/quantization/gptq_marlin/gptq_marlin.cu#L89 I really need...
> @Azure-Tang Is bf16 support done? Have you made a PR elsewhere? Didnt donw yet, maybe next week?
> Is there any updates on the AMX int4 progress? You may find it on SOSP branch, which will be merged after being fully tested.
> 我们意图在华为Ascend 910B上进行Deepseek-R1 671B的推理,请问适配新的硬件需要做哪些工作? 我们目前没有昇腾的设备和懂昇腾硬件的同学,所以可能没办法支持您做这个。您可以看一下marlin算子是否可以在昇腾卡上运算? https://github.com/IST-DASLab/marlin
> 我这有,怎么联系?wx群满了 非常感谢您🙏不过这个支持应该不会在短期计划中。如果想加群的话我们更新了主页的二维码
> > > 我们意图在华为Ascend 910B上进行Deepseek-R1 671B的推理,请问适配新的硬件需要做哪些工作? > > > > > > 我们目前没有昇腾的设备和懂昇腾硬件的同学,所以可能没办法支持您做这个。您可以看一下marlin算子是否可以在昇腾卡上运算? https://github.com/IST-DASLab/marlin > > 所以 KTransformer 和硬件之间的接口就只是 Marlin 的矩阵乘算子吗?如果要适配新的硬件,是不是把特定硬件实现的 FP16xINT4 的矩阵乘算子接入 KTransformer 就可以了呢? 看上去是的,您可以仿造我们的linearMarlin算子(在linear.py里)构造一个新的算子,然后可能需要注意一下load权重部分,我们是将gguf dequant后再用marlin pack成marlin的权重。这部分如果有不懂的欢迎交流
Supporting gpt-oss is a bit complicated and needs scheduling, orz.
> [@Azure-Tang](https://github.com/Azure-Tang) can you help check ? thank you very much Hi, I think you are using `fp8` yaml, which needs to load special weights. For using IQ1s, you need...