aws-eda-slurm-cluster
aws-eda-slurm-cluster copied to clipboard
Running install.sh with -cdk-cmd update in rapid succession can damage the cluster
I ran a --cdk-cmd update to update Instance selections. Then I realized I wanted an additional change, so I modified my config file, and ran the update again. Unfortunately, this corrupted my cluster as the two commands were run too close in succession. The second command tried to do a rollback and that failed.
Can we put in some sort of check to ensure the CloudFormation is not "IN PROGRESS" before allowing install.sh to update?
To reproduce just change some instances in your config and then do it again in rapid order.