awsome-distributed-training
awsome-distributed-training copied to clipboard
Cluster creation fails due to invalid json in provisioning_parameters.json
The provisioning_parameters.json needs to be valid json or the cluster creation will fail, for example the following json is missing the partition_name value:
{
"version": "1.0.0",
"workload_manager": "slurm",
"controller_group": "controller-machine",
"worker_groups": [
{
"instance_group_name": "worker-group-1",
"partition_name":
}
],
"fsx_dns_name": "fs-05dac34e835f2c...fsx.us-west-2.amazonaws.com",
"fsx_mountname": "4owup..."
}
This passes the validator but fails cluster creation.
This is addressed in https://github.com/aws-samples/awsome-distributed-training/pull/233
This issue is stale because it has been open for 30 days with no activity.
This issue was closed because it has been inactive for 14 days since being marked as stale.