torchrec icon indicating copy to clipboard operation
torchrec copied to clipboard

Use Torchrec OSS Planner (2/2)

Open ge0405 opened this issue 3 years ago • 4 comments

Summary: Incorporate Torchrec's OSS planner into train_module. The OSS planner can be enabled by setting "use_torchrec_oss_planner" flag to true in SharderOptions (https://fburl.com/code/xjlbrscm). When the flag is true, run_planner calls run_oss_planner and returns a plan if one can be found. If not found, ShardingError will be raised, which aligns with the current behavior of run_planner. The current usage of planner does not support UVM hybrid mode, as the plan searching time can exceed 20 min, which is too long for dry-run.

The components for the OSS planner are as follows: (1) Topology: both hbm_cap and ddr_cap come from planner_storage_in_gb. Since planner_storage_in_gb["hbm"] removes reserved_hbm_size when being constructed, we need to add reserved_hbm_size back for planner to see the whole hbm storage. (2) reserved_storage: HeuristicalStorageReservation, with the reserved percentage coming from reserved_hbm_size/total_hbm_storage. (3) perf_estimator: not specified, so the default EmbeddingPerfEstimator is used. (4) storage_estimator: not specified, so the default EmbeddingStorageEstimator is used. (5) constraints: currently sharding_types, compute_kernels, pooling factors and min_partition are specified. In the future, caching_ratio (that takes user-specified reserved_hbm_size_for_cache) can be added.

Differential Revision: D37948466

ge0405 avatar Aug 04 '22 20:08 ge0405

This pull request was exported from Phabricator. Differential Revision: D37948466

facebook-github-bot avatar Aug 04 '22 20:08 facebook-github-bot

This pull request was exported from Phabricator. Differential Revision: D37948466

facebook-github-bot avatar Aug 04 '22 20:08 facebook-github-bot

This pull request was exported from Phabricator. Differential Revision: D37948466

facebook-github-bot avatar Aug 08 '22 18:08 facebook-github-bot

This pull request was exported from Phabricator. Differential Revision: D37948466

facebook-github-bot avatar Aug 08 '22 19:08 facebook-github-bot

This pull request was exported from Phabricator. Differential Revision: D37948466

facebook-github-bot avatar Aug 12 '22 20:08 facebook-github-bot

This pull request was exported from Phabricator. Differential Revision: D37948466

facebook-github-bot avatar Aug 15 '22 21:08 facebook-github-bot

This pull request was exported from Phabricator. Differential Revision: D37948466

facebook-github-bot avatar Aug 15 '22 21:08 facebook-github-bot

This pull request was exported from Phabricator. Differential Revision: D37948466

facebook-github-bot avatar Aug 16 '22 04:08 facebook-github-bot