cxxuser
Results
1
comments of
cxxuser
sync_layer.mlp.linear_fc1.layer_norm_weight if dst_pp_rank == pp_rank else None, qwen3 does not have linear_fc1, and there are some problems with this part, if I don't use the dist to load.