WangXingyu comments

Results 5 comments of


                                            WangXingyu

Add global default device

> 这个 set/get_global_default_device > > * torch 有对应的么 > * global 是什么含义了，因为我们有 global tensor，所以再使用 global 需要想清楚 * pytorch2.0有支持，[torch.set_default_device](https://pytorch.org/docs/stable/generated/torch.set_default_device.html)。 * 用 global 有两个考虑：一是这个标志是个全局量，二是本来打算让 local tensor 和 global tensor 都能被支持，但是由于构造 placement 时必须传入...

Add global default device

> > > 这个 set/get_global_default_device > > > > > > * torch 有对应的么 > > > * global 是什么含义了，因为我们有 global tensor，所以再使用 global 需要想清楚 > > > > > >...

Add global default device

> 另外 global tensor 相关的，之前我们提供了一个 global mode，看起来是一类功能（global mode 是给 global tensor 用的），可以综合考虑下 > > https://oneflow.readthedocs.io/en/master/generated/oneflow.utils.global_view.global_mode.html#oneflow.utils.global_view.global_mode 嗯嗯，好的

[BUG] Qwen3 MoE with FSDP2 meets `torch.utils.checkpoint.CheckpointError` when `offload_policy=True`

+1, sft meets same issue

[Bug]: qwen2-vl 7b, on vllm 0.8.1 & 0.8.2, sometimes (not deterministically but depends on data) I got: ValueError: Attempted to assign 702 = 702 multimodal tokens to 703 placeholders

> I think it may be because the multimodal embeddings are merged into the text embeddings before sampling is done. So none of the sampling parameters can avoid this problem....