awsome-distributed-training
awsome-distributed-training copied to clipboard
Organize SM-modelparallelv2 per orchestrator
In current form, there are various files without specific orchestrator. This issue to organize per orchestrator:
- kubernets/train.yaml
- slurm/train.sbatch