Weihang Wang

Results 9 comments of Weihang Wang

Hello, I have received your email.

> Hey! Not a paper author here, but I'm currently working on reproducing the results of OpenMoe paper specificaly on token routing. Take a look: https://github.com/Misterion777/moe-experiments/blob/main/notebooks/routing_eda.ipynb Would appreciate any collaboration!...

Hello, I have received your email.

Why have you added warnings only for the initialization process and not for renaming during loading as well? The model I'm using is timm's convnext (which is even the companion...

> One more thing is that the model you are using is not quantized to FP8. It is FP16. Hello, thank you for your reply. My launch command follows the...

> One more thing is that the model you are using is not quantized to FP8. It is FP16. I'm curious about this. According to the calculations on the website...

> > Have you guys added special tokens to your tokenizer but do not resize lm_embedding leads to a mismatch between labels class and lm_head. It seems that they are...