torchrec
torchrec copied to clipboard
support sub group collective plan
Summary: PP requires non contiguous DMP sharding. In today's torchrec planner, there are various locations where ranks are assumed to be contiguous, this prevents intra host pipeline parallel to utilize nvlink.
This set of changes basically:
- introduces
device_ranksinTopologyand defaults tolist(range(world_size))which is the same as today. But caller can pass in the specific topology instead. - Changes list to dict in various places since this assumption no longer holds.
Differential Revision: D55482028
This pull request was exported from Phabricator. Differential Revision: D55482028
This pull request was exported from Phabricator. Differential Revision: D55482028
This pull request was exported from Phabricator. Differential Revision: D55482028
This pull request was exported from Phabricator. Differential Revision: D55482028
This pull request was exported from Phabricator. Differential Revision: D55482028
This pull request was exported from Phabricator. Differential Revision: D55482028
This pull request was exported from Phabricator. Differential Revision: D55482028
This pull request was exported from Phabricator. Differential Revision: D55482028
This pull request was exported from Phabricator. Differential Revision: D55482028
This pull request was exported from Phabricator. Differential Revision: D55482028
This pull request was exported from Phabricator. Differential Revision: D55482028