MCore Partial DistOpt Feature
What does this PR do ?
This PR adds an interface argument to support the MCore Partial DistOpt feature. The argument partial_data_parallel_shard_factor determines the the level of sharding that can be done on the MCore DistOpt across the DP domain.
The setting need to be set as training.model.optim.partial_data_parallel_shard_factor=2 in the configuration YAML file.
This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days.
This PR was closed because it has been inactive for 7 days since being marked as stale.
[🤖]: Hi @sanandaraj5597 👋,
We wanted to let you know that a CICD pipeline for this PR just finished successfully
So it might be time to merge this PR or get some approvals
I'm just a bot so I'll leave it you what to do next.
//cc @pablo-garay @ko3n1g