otp Possible Bug: `try_await_forever` letting panic through

Possible Bug: `try_await_forever` letting panic through

Open bcpeinhardt opened this issue 7 months ago • 15 comments

My understanding of task.try_await_forever is that it should catch when a process panics and return an error. However it seems to cause the calling process to panic

pub fn main() {
  io.println("Try await a panicking task")
  let _ = task.try_await_forever(task.async(fn() { panic }))

  // This line is never reached
  io.println("Try await caught the error")
}

Is this a bug or am I misunderstanding the function?

Nov 11 '23 07:11 bcpeinhardt

For errors not to propagate you'll need to make the process that spawns the task to trap exits.

I agree though, this API is a bit confusing, I'm not sure what the intention is for how and when to use it. @TanklesXL do you remember better than I do?

Nov 11 '23 12:11 lpil

From what I remember when I tried this my assumption was that trap exits was enabled by default. Maybe this might be something we want as a parameter to task.async or maybe a separate function, something like a task.async_trap? It just seemed weird to me that trap exits wasn't the default behaviour since it would be easy to return an error if the task process crashed

Nov 11 '23 16:11 TanklesXL

I suspect part of the confusion here is that when I used this last I assumed the await function would be able to preemptively handle any trapping but I don't think that makes sense, this is something that needs to be set up when the task process is spawned

Nov 11 '23 16:11 TanklesXL

For the historical context side of things try_await_forever was added after the original try_await to allow for certain cases where you don't necessarily care or want a timeout on your task

Nov 11 '23 17:11 TanklesXL

So for context I think this used to work? I ran into the issue trying to update testbldr for the new gleam version, which included updating the gleam_otp dependency.

Nov 11 '23 17:11 bcpeinhardt

I wouldn't be surprised if it never actually trapped exits, I remember having to do some funky rescueshenanigans in gladvent

Nov 11 '23 20:11 TanklesXL

I've verified that this behaviour of not trapping exits is independent of whether you use task.await or task.await_forever. Upon further inspection I suspect there's a race condition in task.async() where the process.monitor_process may only be called after the task process has already crashed:

pub fn async(work: fn() -> value) -> Task(value) {
 ...
  let pid =
    process.start(linked: True, running: fn() { process.send(subject, work()) })
  let monitor = process.monitor_process(pid)
  ...
      |> process.selecting_process_down(monitor, FromMonitor)
...
}

Notice that the process can only be monitored after the task has already been spawned, this is likely what happens.

What i find weird though is that when I add a sleep to the task process the exit still isn't trapped, which leads me to believe there might be a bug in monitor_process on top of the race condition, note the following test still crashes even though the sleep should allow the task process to be monitored correctly:

pub fn async_await_forever_panic_test() {
  task.async(fn() {
    process.sleep(2000)
    panic as "panic"
  })
  |> task.await_forever()
  |> should.be_error
}

If i add a process.trap_exits to the beginning of the test I am able to successfully trap the exit message and cause task.await_forever to fail with a match error since it's not being set up with selecting_trapped_exits but instead since it's depending on monitor_process it's using selecting_process_down.

Nov 13 '23 00:11 TanklesXL

Update: I'm playing around with using spawn_monitor rather than spawn_link in an attempt to address that race condition but I'm getting badarg in the demonitor function, I need to investigate this further 🤔

Nov 13 '23 03:11 TanklesXL

Sounds like there's no clear use for this function. Trapping exits is not something one should do unless you're implementing a supervisor, and will cause all sorts of problems if it's enabled temporarily as concurrent processes could crash and cause unexpected behaviour.

Let's remove this function.

Nov 13 '23 17:11 lpil

Sounds like there's no clear use for this function

I'm not sure how you ended up at this conclusion after what i said earlier 😆 i use these task functions all the time, the only caveat here is that the user needs to know that if a crash occurs it'll crash the calling process

Which do you mean to remove? The issue occurs regardless of which await you use? This is not unique to try_await or try_await_forever it is also present in await and await_forever.

unless you mean remove the task module as a whole?

Nov 13 '23 20:11 TanklesXL

Any functions that require the user to enable trapping exits should be removed. I think that is only try_await_forever as try_await returns an error in the case of a timeout.

Nov 13 '23 20:11 lpil

Any functions that require the user to enable trapping exits should be removed. I think that is only try_await_forever as try_await returns an error in the case of a timeout.

It does yes but they both still allow a panic to crash the calling process, the only difference is the timeout so they both require trapping exits

Nov 13 '23 20:11 TanklesXL

A crash is supposed to crash the calling process, Task is not an exception catching system, that's what supervisors are for.

Nov 13 '23 21:11 lpil

so this has really been bugging me so i played around a bit more and managed to fix these issues usingspawn_monitor(not currently included in gleam/erlang/process), it required a change to add process_monitor_from_ref in gleam/erlang/process (this can also be achieved by making that type non-opaque)

in gleam/erlang/process:

pub fn process_monitor_from_ref(ref: Reference) -> ProcessMonitor {
  ProcessMonitor(ref)
}

the problem was indeed with how we are attaching monitors to processes after the fact (no idea if there is a bug in the monitor_process function but that's a separate conversation...

@external(erlang, "erlang", "spawn_monitor")
fn spawn_monitor(running f: fn() -> any) -> #(Pid, Reference)

/// Spawn a task process that calls a given function in order to perform some
/// work. The result of this function is send back to the parent and can be
/// received using the `await` function.
///
/// See the top level module documentation for more information on async/await.
///
pub fn async(work: fn() -> value) -> Task(value) {
  let owner = process.self()
  let subject = process.new_subject()
  let #(pid, ref) =
    spawn_monitor(running: fn() { process.send(subject, work()) })
  let monitor = process.process_monitor_from_ref(ref)
  let selector =
    process.new_selector()
    |> process.selecting_process_down(monitor, FromMonitor)
    |> process.selecting(subject, FromSubject)
  Task(owner: owner, pid: pid, monitor: monitor, selector: selector)
}

corresponding passing test cases:

pub fn try_await_forever_panic_test() {
  // Start with an empty mailbox
  flush()

  // Perform an asynchronous task
  // and monitor it until it's done
  let _result =
    task.async(fn() { panic as "hehe" })
    |> task.try_await_forever
    |> should.be_error()

  // Mailbox should be empty;
  // no "DOWN" message should have been sent
  process.self()
  |> get_message_queue_length
  |> should.equal(0)
}

pub fn try_await_panic_test() {
  // Start with an empty mailbox
  flush()

  // Perform an asynchronous task
  // and monitor it until it's done
  let _result =
    task.async(fn() { panic as "haha" })
    |> task.try_await(1000)
    |> should.be_error()

  // Mailbox should be empty;
  // no "DOWN" message should have been sent
  process.self()
  |> get_message_queue_length
  |> should.equal(0)
}

Nov 14 '23 21:11 TanklesXL

We have decided on Discord to rewrite the Task module using the Elixir one as a reference.

Nov 20 '23 14:11 lpil

otp otp copied to clipboard

Possible Bug: `try_await_forever` letting panic through

otp
otp copied to clipboard