hack-SysML icon indicating copy to clipboard operation
hack-SysML copied to clipboard

modify auxiliary loss computation

Open AceCoder0 opened this issue 1 year ago • 0 comments

""" 改动说明: 修改https://huggingface.co/deepseek-ai/DeepSeek-V2/blob/main/modeling_deepseek.py的MoEGate类 补充: Device-Level Balance Loss 和 Communication Balance Loss 的计算 最终aux_loss为3者简单相加(代码:109-149行) 在config.json中添加了M alpha 1, 2, 3 都使用aux_loss_alpha """

AceCoder0 avatar Jun 13 '24 07:06 AceCoder0