ZincCat comments

Results 18 comments of


                                            ZincCat

added azure openai support

maybe also need to change doc/main readme

Integration of fused moe kernel (e.g., megablocks) for efficient moe training

Seems that original kernel is binded to gpt oss, I have made it work for qwen3, but it seems deepspeed is causing trouble if I try to merge the experts...

Integration of fused moe kernel (e.g., megablocks) for efficient moe training

sure, I'll provide my version later

Integration of fused moe kernel (e.g., megablocks) for efficient moe training

you may reference the commits in https://github.com/zinccat/qwen3_moe_megablocks

Integration of fused moe kernel (e.g., megablocks) for efficient moe training

it's either `self.weight = nn.Parameter(torch.empty(config.num_experts, config.hidden_size, dtype=torch.bfloat16))` or the class as a subclass of`nn.Linear(config.hidden_size, config.num_experts)`, as specified in your reference

Using deepspeed and activation_offloading together result in wrong parameter key in saved weight

I'm currently using a similar approach

Using deepspeed and activation_offloading together result in wrong parameter key in saved weight

It's quite simple, just replace the _checkpoint_wrapped_model part in the model weight's key