ictzyqq

Results 4 comments of ictzyqq

> `qkv` is not changed between the codes you mark. So, it is expected to get same `qkv` result. `alibi` is applied on qkv in `gpt_attention`. I have a similar...

> 我留意到`y_null`的引入: > > https://github.com/hpcaitech/Open-Sora/blob/a37a189482a4cd1c7892aa06881e539cbf8078ce/opensora/schedulers/iddpm/__init__.py#L69 > > > 但是在样本级别的拼接为什么会影响输出样本? 涉及到classifier-free guidance(cfg)的相关知识,你可以参考下[链接](https://github.com/hpcaitech/Open-Sora/blob/a37a189482a4cd1c7892aa06881e539cbf8078ce/opensora/schedulers/iddpm/__init__.py#L87)中的forward_with_cfg函数,包含了样本间输出的融合

It seems that deepseek-V3/R1 using sglang cannot achieve the 88.5/90.8 accuracy of MMLU claimed in the paper? I wonder how to reproduce the MMLU accuracy in the paper.