nanny icon indicating copy to clipboard operation
nanny copied to clipboard

Potential orphaned workers

Open Raynos opened this issue 10 years ago • 4 comments

@kriskowal noted that he had a set of child processes floating around without a master process.

There is a possibility that a child process might get orphaned and float around even after the master is killed, due to magic race conditions.

I'd be nice if nanny had a way to effectively "GC" these worker processes.

Raynos avatar Oct 22 '14 21:10 Raynos

One solution might be something like:

  • nanny creates an append only log of { type: 'spawned', pid: pid } and { type: 'dead', pid: pid }.
  • whenever nanny starts a child process it writes a spawned message to the log
  • whenever the nanny master process get's spawn at application startup it goes through the log and checks for each spawned pid that is not dead whether it is dead. It then kills any floating pids and writes dead messages to the log.

This ensures that:

  • all spawned processes are tracked
  • we GC once on nanny master startup.

This solution should be pretty simple and requires very little book keeping and ensures that these workers are always dead if nanny boots up.

This does not handle the case where you shut a nanny controller master process down on a server and NEVER IN THE FUTURE EVER start a nanny master process.

However assuming at least one nanny process runs on the box at some point in the future then all workers will be eventually consistently dead.

Raynos avatar Oct 22 '14 21:10 Raynos

This also does not account for reused process identifiers after a reboot. When I last considered a similar feature, I started looking into a way to uniquely identify a boot session, such that [pid, session] would be unique for any process on a particular system, but haven’t turned up any ideas.

kriskowal avatar Oct 22 '14 21:10 kriskowal

@kriskowal

makes a good point, pids are not unique, we will have to write some kind of unique thing to an append only log.

I would assume this is a solved problem (process unique-ness) and we can just do a bit of research to figure out the correct thing.

Raynos avatar Oct 22 '14 21:10 Raynos

I think the previous solution was both parent and children will kill the children in specific circumstances. It seems like in this case the children can kill themselves if their parent's pid is gone or will not communicate with them (e.g. it was recycled).

sh1mmer avatar Oct 27 '14 19:10 sh1mmer