bigmachine icon indicating copy to clipboard operation
bigmachine copied to clipboard

(*System) Shutdown signature && bigmachine.Machine unique IDs

Open DazWilkin opened this issue 4 years ago • 5 comments

The signature of Start is:

(*System) Start(ctx context.Context, count int) ([]*bigmachine.Machine, error)

Whereas (its converse) Shutdown is:

(*System) Shutdown()

It feels as though it would be more consistent , if Shutdown's signature included both context.Context and []*bigmachine.Machine, also returning an error.

Even then, bigmachine.Machine's type does not include a unique ID for the machine (beyond an IP address; often not used as a key), would it make sense to add one?

I'm not retaining the list of machines created by (Start in) the GCE implementation and so, conversely when asked to Shutdown, I must first enumerate all the instances that (I think) have been created (I'm doing this by tag, could potentially use IP) and then make a call to delete these.

DazWilkin avatar Oct 29 '19 22:10 DazWilkin

(*System) Shutdown() is to shutdown the system implementation (e.g., maybe to serialize its internal state), and not to shut down individual machines.

A couple of things:

  1. The way machine shutdown works is not to explicitly shut them down, but rather to arrange for them to die once keepalives are no longer maintained. This has some nice "end-to-end" properties: in particular, it doesn't matter how the driver dies (gracefully, unexpectedly, or due to a network partition), the machine will eventually shut itself down, since a dead driver no longer maintains keepalives. This is done by the Supervisor service.

  2. About names. Yes, currently bigmachine names each machine by its Addr. This is also how machines can talk to each other (via Dial). While this works, it is also slightly problematic: Addrs can be recycled. For example, a machine could die, and another could come up on the same address. I have a plan to generate effectively a GUID as well so that a machine's name becomes its address concatenated by a GUID.

mariusae avatar Oct 30 '19 16:10 mariusae

Thank you.

I misunderstood.

I'll review the EC2 implementation as I'm unclear how to "arrange for them to die" on GCE.

DazWilkin avatar Oct 30 '19 17:10 DazWilkin

On EC2, the way we do this is to set the instance shutdown behavior to "terminate", and then we instruct systemd to shut down the OS when the process fails.

mariusae avatar Oct 30 '19 17:10 mariusae

I may have to be more explicit about this.

I think there's no way for GCE instances to delete themselves on shutdown|failure.

DazWilkin avatar Oct 30 '19 22:10 DazWilkin

Okay, maybe there's a way to invoke the GCE API from the command line once the process exits?

mariusae avatar Nov 01 '19 03:11 mariusae