torchrec icon indicating copy to clipboard operation
torchrec copied to clipboard

support sub group collective plan

Open xunnanxu opened this issue 1 year ago • 11 comments

Summary: PP requires non contiguous DMP sharding. In today's torchrec planner, there are various locations where ranks are assumed to be contiguous, this prevents intra host pipeline parallel to utilize nvlink.

image

This set of changes basically:

  1. introduces device_ranks in Topology and defaults to list(range(world_size)) which is the same as today. But caller can pass in the specific topology instead.
  2. Changes list to dict in various places since this assumption no longer holds.

Differential Revision: D55482028

xunnanxu avatar Mar 29 '24 21:03 xunnanxu

This pull request was exported from Phabricator. Differential Revision: D55482028

facebook-github-bot avatar Mar 29 '24 21:03 facebook-github-bot

This pull request was exported from Phabricator. Differential Revision: D55482028

facebook-github-bot avatar Mar 29 '24 21:03 facebook-github-bot

This pull request was exported from Phabricator. Differential Revision: D55482028

facebook-github-bot avatar May 03 '24 19:05 facebook-github-bot

This pull request was exported from Phabricator. Differential Revision: D55482028

facebook-github-bot avatar May 03 '24 19:05 facebook-github-bot

This pull request was exported from Phabricator. Differential Revision: D55482028

facebook-github-bot avatar May 03 '24 19:05 facebook-github-bot

This pull request was exported from Phabricator. Differential Revision: D55482028

facebook-github-bot avatar May 04 '24 00:05 facebook-github-bot

This pull request was exported from Phabricator. Differential Revision: D55482028

facebook-github-bot avatar May 04 '24 17:05 facebook-github-bot

This pull request was exported from Phabricator. Differential Revision: D55482028

facebook-github-bot avatar May 04 '24 17:05 facebook-github-bot

This pull request was exported from Phabricator. Differential Revision: D55482028

facebook-github-bot avatar May 04 '24 17:05 facebook-github-bot

This pull request was exported from Phabricator. Differential Revision: D55482028

facebook-github-bot avatar May 04 '24 17:05 facebook-github-bot

This pull request was exported from Phabricator. Differential Revision: D55482028

facebook-github-bot avatar Jul 17 '24 17:07 facebook-github-bot