ZincCat

Results 18 comments of ZincCat

maybe also need to change doc/main readme

just fixed it, thanks!

Seems that original kernel is binded to gpt oss, I have made it work for qwen3, but it seems deepspeed is causing trouble if I try to merge the experts...

you may reference the commits in https://github.com/zinccat/qwen3_moe_megablocks

it's either `self.weight = nn.Parameter(torch.empty(config.num_experts, config.hidden_size, dtype=torch.bfloat16))` or the class as a subclass of`nn.Linear(config.hidden_size, config.num_experts)`, as specified in your reference

It's quite simple, just replace the _checkpoint_wrapped_model part in the model weight's key