gravity icon indicating copy to clipboard operation
gravity copied to clipboard

[Feature] autojoin retry logic

Open knisbet opened this issue 4 years ago • 2 comments

Currently we embed significant logic in terraform for bootstrapping AWS clusters. Ideally some of this logic would move to the autojoin code embedded within gravity so it's better defined and creates less complicated bootstrapping scripts.

Items of note:

  • [x] Retrieval of ServiceURL and Join token should retry and wait until the keys are created
  • [x] On cluster re-install, test that the ServiceURL is available should be moved to autojoin (to wait for the new cluster key)
  • [ ] When doing a multi-master install, the election of which master to run the install from can be codified as well

Current leader election: https://github.com/gravitational/terraform-gravity/blob/ec026d9d3d64485fec29c8bca7e34a8a88cb0a7f/aws/cloud-init/master.script.tpl#L42-L53

Waiting is done by waiting for a file copy from the leader: https://github.com/gravitational/terraform-gravity/blob/master/aws/cloud-init/master.script.tpl#L284-L297

knisbet avatar Oct 08 '19 16:10 knisbet

If the service URL and token are empty behave as if they haven't been created; That allows TF to create the SSM keys as empty, and when a cluster is deleted, clean it up.

helgi avatar Oct 08 '19 16:10 helgi

Some of this is addressed in PR https://github.com/gravitational/gravity/pull/803

knisbet avatar Oct 17 '19 15:10 knisbet