ranzhejiang
ranzhejiang
@songdezhao It seems that qwen3_moe use another norm function which is different from qwen3, can you try this commit to fix this qwen3_moe error, https://github.com/deepspeedai/DeepSpeed/pull/7297/commits/12558b92f540aec04aa7835ce7df522fa68e9f25
> [@ranzhejiang](https://github.com/ranzhejiang) , thanks for the fix. I tested this commit and it worked when I use "sdpa" attention. However, if I change the attention to "flash_attention_2", I still got...
@songdezhao For this issue, both qwen3 and qwen3_moe models was indeed loaded after our new commits, but for your new error, it seems that you add other new code in...
@songdezhao Thanks for your scripts, this error seems hard to debug for me, any idea for this problem ? I will try to find a GPU machine to solve it.
@songdezhao After debugging, Your code triggered a boundary condition for qwen3-moe ```python query_states.shape is: torch.Size([2, 0, 17, 128] ``` The root cause is that the cuda or triton kernel can...
> Thanks a lot for looking into this. Simply curious: Is this Qwen3-MoE specific or would we expect same issue to happen for other MoE models as well? I am...
> [@songdezhao](https://github.com/songdezhao) After debugging, Your code triggered a boundary condition for qwen3-moe > > query_states.shape is: torch.Size([2, 0, 17, 128] > The root cause is that the cuda or triton...
> Thanks a lot for looking into this. Simply curious: Is this Qwen3-MoE specific or would we expect same issue to happen for other MoE models as well? @songdezhao @loadams...
@wyooyw It seems that you should also delete or comment https://github.com/microsoft/DeepSpeed/blob/master/deepspeed/runtime/zero/stage_1_and_2.py#L1072 when you delete https://github.com/microsoft/DeepSpeed/blob/master/deepspeed/runtime/zero/stage_1_and_2.py#L1079
After reading your example and deepspeed code, I think the main reanson is that you specify the config "checkpoint", ```python "checkpoints": glob.glob(os.path.join(model_path, "**", "*" + ".safetensors"), recursive=False), ``` the root...