Chien-Chin Huang
Chien-Chin Huang
@yzhangcs https://github.com/pytorch/pytorch/pull/148825 fixes the issue.
@yzhangcs The PR is landed. You should be able to get it with the next nightly built PyTorch. Please let me know if that completely resolve the issue. I can...
@yzhangcs I'll close the issue. Let me know if you still encounter issues.
@galalalala The configuration you suggested is not valid. With your proposed sharding strategy, a global batch (assuming batch size is 8 and sequence size 8192) need to be first sharded...
@galalalala Yes.
Thanks for the PR. I feel we can remove `.github/scripts/update_version.sh` and just use importlib.metadata to get the version. But this is orthogonal to this PR.
If one install TorchTitan package correctly, the version should be in metadata. And since this PR make version to be dynamic, it will automatically reflect the latest version with git...
Since CI is currently not available, please refrain from landing the PR until we can have some CI signals.
@d4l3k It seems that write_state_dict and read_state_dict won't work with DTensor. Please correct me if I'm wrong.