Xilun Wu

Results 12 comments of Xilun Wu

Xilun starts to investigate the implementation of DS2 and collect training sets of reasonable size.

The speedup of testing is quite impressive!! Congrats!

> > Thanks! One suggestion for unit testing would be to create a DeviceMesh in FakeMode to reproduce the issue that I had! Or maybe create a DeviceMesh inside of...

re-enable DTensor tests on CPU in #118134

note: this test requires the land of https://github.com/pytorch/pytorch/pull/126924

You can also try DTensor `local_map` as how we enabled FusedRMSNorm in torchtitan: #404 , which is the second approach in @yifuwang 's comment.

I think this argument currently serve as a placeholder and may be used in future. What do you think? @lessw2020 @tianyu-l

https://github.com/pytorch/torchtune/blob/main/torchtune/training/checkpointing/_checkpoint_client.py#L344-L346 This will all-gather the optim state dict on ranks which could lead to high memory usage. Is this desired? @pradeepfn @calvinpelletier

I believe [`local_map`](https://pytorch.org/docs/main/distributed.tensor.html#torch.distributed.tensor.experimental.local_map) is a good fit for this case, to implement a custom `clip_grad_norm_` for DTensor. @zijian-hu let me draft a PR based on your sample so that we...