awsome-distributed-training icon indicating copy to clipboard operation
awsome-distributed-training copied to clipboard

Cluster creation fails due to invalid json in provisioning_parameters.json

Open sean-smith opened this issue 1 year ago • 1 comments

The provisioning_parameters.json needs to be valid json or the cluster creation will fail, for example the following json is missing the partition_name value:

{
  "version": "1.0.0",
  "workload_manager": "slurm",
  "controller_group": "controller-machine",
  "worker_groups": [
    {
      "instance_group_name": "worker-group-1",
      "partition_name":
    }
  ],
  "fsx_dns_name": "fs-05dac34e835f2c...fsx.us-west-2.amazonaws.com",
  "fsx_mountname": "4owup..."
}

This passes the validator but fails cluster creation.

This is addressed in https://github.com/aws-samples/awsome-distributed-training/pull/233

sean-smith avatar Apr 01 '24 08:04 sean-smith

This issue is stale because it has been open for 30 days with no activity.

github-actions[bot] avatar Jul 01 '24 01:07 github-actions[bot]

This issue was closed because it has been inactive for 14 days since being marked as stale.

github-actions[bot] avatar Aug 30 '24 01:08 github-actions[bot]