Walker

Results 5 comments of Walker

You should check model.save().

可能你yml文件里面的实验名称和这个是一样的吧?并且指定了auto_resume

In this case, Mixed precision of `replicate_with_fsdp` should be handled by fully_shard instead of AMP. This means that we need to modify `torchtitan/distributed/utils.py/maybe_enable_amp()` to accommodate `replicate_with_fsdp` . By the way,...

> my request changes is mainly on 2d mesh. we should target 1d mesh for landing. it's a user contract in public facing api I think the use of 2D...

By the way, unlike `set_requires_gradient_sync`, `set_requires_all_reduce` does not incur an additional memory burden. @tianyu-l