otp
otp copied to clipboard
Actor's `init` taking longer than `init_timeout` causes entire program to crash
Like the title describes, I've run into an issue where if the inittakes too long instead of returning an error then actor.start_spec will crash the entire program with the following error:
Runtime terminating during boot (killed)
Crash dump is being written to: erl_crash.dump...done
I'm using Erlang/OTP 27 and have tried on Erlang/OTP 25 with the same result.
Below is a code snippet that demonstrates the issue.
import gleam/erlang/process
import gleam/option
import gleam/otp/actor
pub type State {
State(Int)
}
pub type Message {
Increment
}
pub fn main() {
start_actor()
process.sleep_forever()
}
pub fn start_actor() {
let spec: actor.Spec(State, Message) =
actor.Spec(init:, init_timeout: 10, loop:)
actor.start_spec(spec)
}
pub fn init() {
process.sleep(1_000_000)
actor.Ready(State(0), process.new_selector())
}
pub fn loop(_message, state) {
actor.Continue(state, option.None)
}
Hello! the program is crashing as the linked child process is timing out and it is not supervised, so the exit continues to propagates until it reaches a process trapping exits (there is not one in this case) or until it reaches the root process, crashing the program.
Do you have some particular issue?
If this is the expected behavior then maybe I'm misunderstanding the documentation, or something specific about my implementation. It says here:
Start an actor from a given specification. If the actor’s init function returns an error or does not return within init_timeout then an error is returned.
Based on the documentation my assumption was that the code above wouldn't crash but instead would instead return Error(StatError). Maybe the language "returns an error" is a bit overloaded here and it would be clearer to re-word it?
Ah yes, a bug for sure. Thank you.
I presume we want it to not crash but I've not the time to sit down and think through the implications of that and how we would implement it.
I'm also thinking back to something @crowdhailer said years back, that perhaps init should not fail and starting an actor shouldn't return a result. Could be a good thing to think about now as we're looking at making some breaking changes to support naming.
My opinion has not changed. But I don't have any new compelling arguments.
What were your old arguments? If you can remember the details 😄
I think that you should always be able to validate the initial state before calling an init. And you shouldn't rely on anther pid/service without you actor understanding failure in reaching that service. I think Async init was a problem. I dunno, I'd have to play again to make better arguments