otp
otp copied to clipboard
Possible Bug: `try_await_forever` letting panic through
My understanding of task.try_await_forever
is that it should catch when a process panics and return an error. However it seems to cause the calling process to panic
pub fn main() {
io.println("Try await a panicking task")
let _ = task.try_await_forever(task.async(fn() { panic }))
// This line is never reached
io.println("Try await caught the error")
}
Is this a bug or am I misunderstanding the function?
For errors not to propagate you'll need to make the process that spawns the task to trap exits.
I agree though, this API is a bit confusing, I'm not sure what the intention is for how and when to use it. @TanklesXL do you remember better than I do?
From what I remember when I tried this my assumption was that trap exits was enabled by default. Maybe this might be something we want as a parameter to task.async or maybe a separate function, something like a task.async_trap
? It just seemed weird to me that trap exits wasn't the default behaviour since it would be easy to return an error if the task process crashed
I suspect part of the confusion here is that when I used this last I assumed the await function would be able to preemptively handle any trapping but I don't think that makes sense, this is something that needs to be set up when the task process is spawned
For the historical context side of things try_await_forever was added after the original try_await to allow for certain cases where you don't necessarily care or want a timeout on your task
So for context I think this used to work? I ran into the issue trying to update testbldr
for the new gleam version, which included updating the gleam_otp dependency.
I wouldn't be surprised if it never actually trapped exits, I remember having to do some funky rescue
shenanigans in gladvent
I've verified that this behaviour of not trapping exits is independent of whether you use task.await
or task.await_forever
. Upon further inspection I suspect there's a race condition in task.async()
where the process.monitor_process
may only be called after the task process has already crashed:
pub fn async(work: fn() -> value) -> Task(value) {
...
let pid =
process.start(linked: True, running: fn() { process.send(subject, work()) })
let monitor = process.monitor_process(pid)
...
|> process.selecting_process_down(monitor, FromMonitor)
...
}
Notice that the process can only be monitored after the task has already been spawned, this is likely what happens.
What i find weird though is that when I add a sleep to the task process the exit still isn't trapped, which leads me to believe there might be a bug in monitor_process
on top of the race condition, note the following test still crashes even though the sleep should allow the task process to be monitored correctly:
pub fn async_await_forever_panic_test() {
task.async(fn() {
process.sleep(2000)
panic as "panic"
})
|> task.await_forever()
|> should.be_error
}
If i add a process.trap_exits
to the beginning of the test I am able to successfully trap the exit message and cause task.await_forever
to fail with a match error
since it's not being set up with selecting_trapped_exits
but instead since it's depending on monitor_process
it's using selecting_process_down
.
Update: I'm playing around with using spawn_monitor
rather than spawn_link
in an attempt to address that race condition but I'm getting badarg
in the demonitor
function, I need to investigate this further 🤔
Sounds like there's no clear use for this function. Trapping exits is not something one should do unless you're implementing a supervisor, and will cause all sorts of problems if it's enabled temporarily as concurrent processes could crash and cause unexpected behaviour.
Let's remove this function.
Sounds like there's no clear use for this function
I'm not sure how you ended up at this conclusion after what i said earlier 😆 i use these task functions all the time, the only caveat here is that the user needs to know that if a crash occurs it'll crash the calling process
Which do you mean to remove? The issue occurs regardless of which await
you use? This is not unique to try_await
or try_await_forever
it is also present in await
and await_forever
.
unless you mean remove the task
module as a whole?
Any functions that require the user to enable trapping exits should be removed. I think that is only try_await_forever
as try_await
returns an error in the case of a timeout.
Any functions that require the user to enable trapping exits should be removed. I think that is only try_await_forever as try_await returns an error in the case of a timeout.
It does yes but they both still allow a panic to crash the calling process, the only difference is the timeout so they both require trapping exits
A crash is supposed to crash the calling process, Task is not an exception catching system, that's what supervisors are for.
so this has really been bugging me so i played around a bit more and managed to fix these issues usingspawn_monitor
(not currently included in gleam/erlang/process), it required a change to add process_monitor_from_ref
in gleam/erlang/process
(this can also be achieved by making that type non-opaque)
in gleam/erlang/process:
pub fn process_monitor_from_ref(ref: Reference) -> ProcessMonitor {
ProcessMonitor(ref)
}
the problem was indeed with how we are attaching monitors to processes after the fact (no idea if there is a bug in the monitor_process
function but that's a separate conversation...
@external(erlang, "erlang", "spawn_monitor")
fn spawn_monitor(running f: fn() -> any) -> #(Pid, Reference)
/// Spawn a task process that calls a given function in order to perform some
/// work. The result of this function is send back to the parent and can be
/// received using the `await` function.
///
/// See the top level module documentation for more information on async/await.
///
pub fn async(work: fn() -> value) -> Task(value) {
let owner = process.self()
let subject = process.new_subject()
let #(pid, ref) =
spawn_monitor(running: fn() { process.send(subject, work()) })
let monitor = process.process_monitor_from_ref(ref)
let selector =
process.new_selector()
|> process.selecting_process_down(monitor, FromMonitor)
|> process.selecting(subject, FromSubject)
Task(owner: owner, pid: pid, monitor: monitor, selector: selector)
}
corresponding passing test cases:
pub fn try_await_forever_panic_test() {
// Start with an empty mailbox
flush()
// Perform an asynchronous task
// and monitor it until it's done
let _result =
task.async(fn() { panic as "hehe" })
|> task.try_await_forever
|> should.be_error()
// Mailbox should be empty;
// no "DOWN" message should have been sent
process.self()
|> get_message_queue_length
|> should.equal(0)
}
pub fn try_await_panic_test() {
// Start with an empty mailbox
flush()
// Perform an asynchronous task
// and monitor it until it's done
let _result =
task.async(fn() { panic as "haha" })
|> task.try_await(1000)
|> should.be_error()
// Mailbox should be empty;
// no "DOWN" message should have been sent
process.self()
|> get_message_queue_length
|> should.equal(0)
}
We have decided on Discord to rewrite the Task module using the Elixir one as a reference.