ictzyqq comments

Results 4 comments of


                                            ictzyqq

[Question] Why DiT-XL/2 takes 119 GFlops to generate 256x256 images?

Issue #14 may help you. @void-main

[Question] Smoothquant data dump?

> `qkv` is not changed between the codes you mark. So, it is expected to get same `qkv` result. `alibi` is applied on qkv in `gpt_attention`. I have a similar...

为什么要在推理时复制一份latent进行推理，可以移除吗？

> 我留意到`y_null`的引入： > > https://github.com/hpcaitech/Open-Sora/blob/a37a189482a4cd1c7892aa06881e539cbf8078ce/opensora/schedulers/iddpm/__init__.py#L69 > > > 但是在样本级别的拼接为什么会影响输出样本？涉及到classifier-free guidance(cfg)的相关知识，你可以参考下[链接](https://github.com/hpcaitech/Open-Sora/blob/a37a189482a4cd1c7892aa06881e539cbf8078ce/opensora/schedulers/iddpm/__init__.py#L87)中的forward_with_cfg函数，包含了样本间输出的融合

[Track] DeepSeek V3/R1 accuracy

It seems that deepseek-V3/R1 using sglang cannot achieve the 88.5/90.8 accuracy of MMLU claimed in the paper? I wonder how to reproduce the MMLU accuracy in the paper.