NeMo icon indicating copy to clipboard operation
NeMo copied to clipboard

MCore Partial DistOpt Feature

Open sanandaraj5597 opened this issue 1 year ago • 1 comments

What does this PR do ?

This PR adds an interface argument to support the MCore Partial DistOpt feature. The argument partial_data_parallel_shard_factor determines the the level of sharding that can be done on the MCore DistOpt across the DP domain.

The setting need to be set as training.model.optim.partial_data_parallel_shard_factor=2 in the configuration YAML file.

sanandaraj5597 avatar Oct 01 '24 05:10 sanandaraj5597

This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days.

github-actions[bot] avatar Oct 16 '24 01:10 github-actions[bot]

This PR was closed because it has been inactive for 7 days since being marked as stale.

github-actions[bot] avatar Oct 24 '24 01:10 github-actions[bot]

[🤖]: Hi @sanandaraj5597 👋,

We wanted to let you know that a CICD pipeline for this PR just finished successfully

So it might be time to merge this PR or get some approvals

I'm just a bot so I'll leave it you what to do next.

//cc @pablo-garay @ko3n1g

github-actions[bot] avatar Jan 08 '25 01:01 github-actions[bot]