Daize Dong comments

Results 8 comments of


                                            Daize Dong

[Bugfix] Fix Precision Mismatch in MoE Router of DeepSeek V2/V3 Models and Fused Kernels (BF16 -> FP32)

See if this version can be merged?

[Bugfix] Fix Precision Mismatch in MoE Router of DeepSeek V2/V3 Models and Fused Kernels (BF16 -> FP32)

> any update on this PR? Merged the latest branch and resolved conflicts.

[Bugfix] Fix Precision Mismatch in MoE Router of DeepSeek V2/V3 Models and Fused Kernels (BF16 -> FP32)

Seems most benchmarks stay relatively stable, but `ifeval` regresses a lot. Is this reasonable?

Qwen3-30B issue: AttributeError: 'MoELayer' object has no attribute 'linear_fc1'

same

Inquiry About K-Means Initialization for Gates Without Fine-Tuning

Thank your for your attention to our project! That is a very good question! Unfortunately, according to our observation, the converted model w/o further finetuning is kind of "broken", i.e.,...

Inquiry About K-Means Initialization for Gates Without Fine-Tuning

@pprp Sorry that we didn't conduct experiments on ablating the initialization method of the gate weights. However, this method can lead to better balancedness at the initial, and I believe...

Inquiry About K-Means Initialization for Gates Without Fine-Tuning

@pprp Your images show that both models suffer great performance loss after initialization, and this observation aligns with ours. I think you need to train the models with more tokens...

author_match parameter is not used

> [gpt_paper_assistant/configs/config.ini](https://github.com/tatsu-lab/gpt_paper_assistant/blob/5fbf2459ef6b95ea0da0baf5ec6a4083f15bcc5d/configs/config.ini#L23) > > Line 23 in [5fbf245](/tatsu-lab/gpt_paper_assistant/commit/5fbf2459ef6b95ea0da0baf5ec6a4083f15bcc5d) > > author_match = true > > This parameter seems to not be used. I couldn't find a way to turn off...