parent
parent copied to clipboard
Subsequent restarts of a process with a delay
In Parent
's moduledocs there is the following line:
An attempt to restart a child which failed to restart is considered as a crash and contributes to the restart intensity. Thus, if a child repeatedly fails to restart, the parent will give up at some point, according to restart intensity settings.
Is it possible to customize that logic somehow so that a child that fails to restart is treated similarly to a child that fails to start? Or would that require creating my own custom parent process?
Also, it may be possible that there's a better approach to accomplish what I'm trying to do. My scenario is that I have a Parent.GenServer
that has a single child. I want to restart that child with a delay, similar to this example: https://github.com/sasa1977/parent#restarting-with-a-delay
And if that child fails again, I want to restart it with a delay again. But what happens is the restart immediately fails and is retried 3 times in a row at which point my Parent.GenServer
fails and the application is brought down. I know this isn't the scenario that traditional supervisors are meant to handle, but I was hoping that this is a supported use-case for Parent
(ideally without having to create a custom parent process).
When you return the child with return_children
, a restart is recorded, which can lead to the parent self-termination if too many restarts occur within the max_seconds
interval (default limit is 3 restarts in 5 seconds).
You can avoid this in two ways:
- In
handle_stopped_children
, instead of usingreturn_children
, start the new instance of the child usingParent.start_child
which won't bump the restart counter. - Alternatively, provide the
max_restarts: :infinity
option toParent.GenServer.start_link
.
Neither of those options give me quite what I'm looking for.
With 1. the child process is only restarted once even though it keeps crashing
With 2. the Parent supervisor indeed no longer dies, but I see restarts happening continuously and handle_stopped_children
is not being called. I would expect handle_stopped_children
to be called after the first restart (although it's possible that I'm missing something about how it works).
Here's my supervisor with both options shown (and option 2 commented out): https://github.com/axelson/govee_phx/blob/22839a57c34297ab7e1a352b4c0b7b15a3a08f7e/lib/govee_phx_application/ble_supervisor.ex
With 1, you need to check the result of start_child
, and if it's an error, you need to invoke Process.send_after
to schedule another start attempt in the future.
I'll need to investigate the behaviour of 2 some more.
Okay, scheduling the restart again based on the output of start_child
is working well. Maybe the problem with 2. is related to the child failing immediately.
I've pushed a commit to master which should fixe the problem 2. With that change, if a temp child fails to start during a restart, parent will treat it as a crash, and the child will be removed (if ephemeral) or marked as stopped (if not ephemeral).
However, I still advise you to use the approach 1, or even better, to rework your code to ensure that the process always starts successfully (e.g. by deferring a potentially failing work to handle_continue
).
Great, thanks! I do agree that it would be better to rework the code to ensure that the process starts successfully, but that isn't feasible in this case since I'm relying on an alpha library: https://github.com/blue-heron/blue_heron
Also if 1. is recommended over 2., then should an example of 1. be added to the README?
Also if 1. is recommended over 2., then should an example of 1. be added to the README?
I recommend 1 in this particular case, 2 otherwise (a process always starts normally). I'll think about adding a comment someplace in the docs, but probably not readme, which is designed as a quick showcase of some scenarios, not a detailed treatment of all possible edge cases.