Weihang Wang comments

Results 9 comments of


                                            Weihang Wang

[BUG] zero2 and zero3 has different behavior using the same hyperparameter to train a large model

why is the problem still not solved? :(

> Hey! Not a paper author here, but I'm currently working on reproducing the results of OpenMoe paper specificaly on token routing. Take a look: https://github.com/Misterion777/moe-experiments/blob/main/notebooks/routing_eda.ipynb Would appreciate any collaboration!...

About In-batch debiased cross-entropy loss

Hello, I have received your email.

Add warning message for beta and gamma parameters

Why have you added warnings only for the initialization process and not for renaming during loading as well? The model I'm using is timm's convnext (which is even the companion...

[Question] What are the differences between two versions of pretrain datasets?

same :( seens it still not be sloved

Vllm v0.11.0, Qwen3-VL-235B(-FP8) deployed on 8 A100s OOM

> One more thing is that the model you are using is not quantized to FP8. It is FP16. Hello, thank you for your reply. My launch command follows the...

Vllm v0.11.0, Qwen3-VL-235B(-FP8) deployed on 8 A100s OOM

> One more thing is that the model you are using is not quantized to FP8. It is FP16. I'm curious about this. According to the calculations on the website...

RuntimeError: CUDA driver error: invalid argument

> > Have you guys added special tokens to your tokenizer but do not resize lm_embedding leads to a mismatch between labels class and lm_head. It seems that they are...

Weihang Wang

[BUG] zero2 and zero3 has different behavior using the same hyperparameter to train a large model

tokens routing

tokens routing

About In-batch debiased cross-entropy loss

Add warning message for beta and gamma parameters

[Question] What are the differences between two versions of pretrain datasets?

Vllm v0.11.0, Qwen3-VL-235B(-FP8) deployed on 8 A100s OOM

Vllm v0.11.0, Qwen3-VL-235B(-FP8) deployed on 8 A100s OOM

RuntimeError: CUDA driver error: invalid argument