parent icon indicating copy to clipboard operation
parent copied to clipboard

Subsequent restarts of a process with a delay

Open axelson opened this issue 4 years ago • 7 comments

In Parent's moduledocs there is the following line:

An attempt to restart a child which failed to restart is considered as a crash and contributes to the restart intensity. Thus, if a child repeatedly fails to restart, the parent will give up at some point, according to restart intensity settings.

Is it possible to customize that logic somehow so that a child that fails to restart is treated similarly to a child that fails to start? Or would that require creating my own custom parent process?

Also, it may be possible that there's a better approach to accomplish what I'm trying to do. My scenario is that I have a Parent.GenServer that has a single child. I want to restart that child with a delay, similar to this example: https://github.com/sasa1977/parent#restarting-with-a-delay

And if that child fails again, I want to restart it with a delay again. But what happens is the restart immediately fails and is retried 3 times in a row at which point my Parent.GenServer fails and the application is brought down. I know this isn't the scenario that traditional supervisors are meant to handle, but I was hoping that this is a supported use-case for Parent (ideally without having to create a custom parent process).

axelson avatar Nov 15 '20 21:11 axelson

When you return the child with return_children, a restart is recorded, which can lead to the parent self-termination if too many restarts occur within the max_seconds interval (default limit is 3 restarts in 5 seconds).

You can avoid this in two ways:

  1. In handle_stopped_children, instead of using return_children, start the new instance of the child using Parent.start_child which won't bump the restart counter.
  2. Alternatively, provide the max_restarts: :infinity option to Parent.GenServer.start_link.

sasa1977 avatar Nov 15 '20 22:11 sasa1977

Neither of those options give me quite what I'm looking for.

With 1. the child process is only restarted once even though it keeps crashing

With 2. the Parent supervisor indeed no longer dies, but I see restarts happening continuously and handle_stopped_children is not being called. I would expect handle_stopped_children to be called after the first restart (although it's possible that I'm missing something about how it works).

Here's my supervisor with both options shown (and option 2 commented out): https://github.com/axelson/govee_phx/blob/22839a57c34297ab7e1a352b4c0b7b15a3a08f7e/lib/govee_phx_application/ble_supervisor.ex

axelson avatar Nov 16 '20 00:11 axelson

With 1, you need to check the result of start_child, and if it's an error, you need to invoke Process.send_after to schedule another start attempt in the future.

I'll need to investigate the behaviour of 2 some more.

sasa1977 avatar Nov 16 '20 00:11 sasa1977

Okay, scheduling the restart again based on the output of start_child is working well. Maybe the problem with 2. is related to the child failing immediately.

axelson avatar Nov 16 '20 00:11 axelson

I've pushed a commit to master which should fixe the problem 2. With that change, if a temp child fails to start during a restart, parent will treat it as a crash, and the child will be removed (if ephemeral) or marked as stopped (if not ephemeral).

However, I still advise you to use the approach 1, or even better, to rework your code to ensure that the process always starts successfully (e.g. by deferring a potentially failing work to handle_continue).

sasa1977 avatar Nov 16 '20 16:11 sasa1977

Great, thanks! I do agree that it would be better to rework the code to ensure that the process starts successfully, but that isn't feasible in this case since I'm relying on an alpha library: https://github.com/blue-heron/blue_heron

Also if 1. is recommended over 2., then should an example of 1. be added to the README?

axelson avatar Nov 16 '20 17:11 axelson

Also if 1. is recommended over 2., then should an example of 1. be added to the README?

I recommend 1 in this particular case, 2 otherwise (a process always starts normally). I'll think about adding a comment someplace in the docs, but probably not readme, which is designed as a quick showcase of some scenarios, not a detailed treatment of all possible edge cases.

sasa1977 avatar Nov 16 '20 18:11 sasa1977