otp icon indicating copy to clipboard operation
otp copied to clipboard

Possible Bug: `try_await_forever` letting panic through

Open bcpeinhardt opened this issue 7 months ago • 15 comments

My understanding of task.try_await_forever is that it should catch when a process panics and return an error. However it seems to cause the calling process to panic

pub fn main() {
  io.println("Try await a panicking task")
  let _ = task.try_await_forever(task.async(fn() { panic }))

  // This line is never reached
  io.println("Try await caught the error")
}

Is this a bug or am I misunderstanding the function?

bcpeinhardt avatar Nov 11 '23 07:11 bcpeinhardt

For errors not to propagate you'll need to make the process that spawns the task to trap exits.

I agree though, this API is a bit confusing, I'm not sure what the intention is for how and when to use it. @TanklesXL do you remember better than I do?

lpil avatar Nov 11 '23 12:11 lpil

From what I remember when I tried this my assumption was that trap exits was enabled by default. Maybe this might be something we want as a parameter to task.async or maybe a separate function, something like a task.async_trap? It just seemed weird to me that trap exits wasn't the default behaviour since it would be easy to return an error if the task process crashed

TanklesXL avatar Nov 11 '23 16:11 TanklesXL

I suspect part of the confusion here is that when I used this last I assumed the await function would be able to preemptively handle any trapping but I don't think that makes sense, this is something that needs to be set up when the task process is spawned

TanklesXL avatar Nov 11 '23 16:11 TanklesXL

For the historical context side of things try_await_forever was added after the original try_await to allow for certain cases where you don't necessarily care or want a timeout on your task

TanklesXL avatar Nov 11 '23 17:11 TanklesXL

So for context I think this used to work? I ran into the issue trying to update testbldr for the new gleam version, which included updating the gleam_otp dependency.

bcpeinhardt avatar Nov 11 '23 17:11 bcpeinhardt

I wouldn't be surprised if it never actually trapped exits, I remember having to do some funky rescueshenanigans in gladvent

TanklesXL avatar Nov 11 '23 20:11 TanklesXL

I've verified that this behaviour of not trapping exits is independent of whether you use task.await or task.await_forever. Upon further inspection I suspect there's a race condition in task.async() where the process.monitor_process may only be called after the task process has already crashed:

pub fn async(work: fn() -> value) -> Task(value) {
 ...
  let pid =
    process.start(linked: True, running: fn() { process.send(subject, work()) })
  let monitor = process.monitor_process(pid)
  ...
      |> process.selecting_process_down(monitor, FromMonitor)
...
}

Notice that the process can only be monitored after the task has already been spawned, this is likely what happens.

What i find weird though is that when I add a sleep to the task process the exit still isn't trapped, which leads me to believe there might be a bug in monitor_process on top of the race condition, note the following test still crashes even though the sleep should allow the task process to be monitored correctly:

pub fn async_await_forever_panic_test() {
  task.async(fn() {
    process.sleep(2000)
    panic as "panic"
  })
  |> task.await_forever()
  |> should.be_error
}

If i add a process.trap_exits to the beginning of the test I am able to successfully trap the exit message and cause task.await_forever to fail with a match error since it's not being set up with selecting_trapped_exits but instead since it's depending on monitor_process it's using selecting_process_down.

TanklesXL avatar Nov 13 '23 00:11 TanklesXL

Update: I'm playing around with using spawn_monitor rather than spawn_link in an attempt to address that race condition but I'm getting badarg in the demonitor function, I need to investigate this further 🤔

TanklesXL avatar Nov 13 '23 03:11 TanklesXL

Sounds like there's no clear use for this function. Trapping exits is not something one should do unless you're implementing a supervisor, and will cause all sorts of problems if it's enabled temporarily as concurrent processes could crash and cause unexpected behaviour.

Let's remove this function.

lpil avatar Nov 13 '23 17:11 lpil

Sounds like there's no clear use for this function

I'm not sure how you ended up at this conclusion after what i said earlier 😆 i use these task functions all the time, the only caveat here is that the user needs to know that if a crash occurs it'll crash the calling process

Which do you mean to remove? The issue occurs regardless of which await you use? This is not unique to try_await or try_await_forever it is also present in await and await_forever.

unless you mean remove the task module as a whole?

TanklesXL avatar Nov 13 '23 20:11 TanklesXL

Any functions that require the user to enable trapping exits should be removed. I think that is only try_await_forever as try_await returns an error in the case of a timeout.

lpil avatar Nov 13 '23 20:11 lpil

Any functions that require the user to enable trapping exits should be removed. I think that is only try_await_forever as try_await returns an error in the case of a timeout.

It does yes but they both still allow a panic to crash the calling process, the only difference is the timeout so they both require trapping exits

TanklesXL avatar Nov 13 '23 20:11 TanklesXL

A crash is supposed to crash the calling process, Task is not an exception catching system, that's what supervisors are for.

lpil avatar Nov 13 '23 21:11 lpil

so this has really been bugging me so i played around a bit more and managed to fix these issues usingspawn_monitor(not currently included in gleam/erlang/process), it required a change to add process_monitor_from_ref in gleam/erlang/process (this can also be achieved by making that type non-opaque)

in gleam/erlang/process:

pub fn process_monitor_from_ref(ref: Reference) -> ProcessMonitor {
  ProcessMonitor(ref)
}

the problem was indeed with how we are attaching monitors to processes after the fact (no idea if there is a bug in the monitor_process function but that's a separate conversation...

@external(erlang, "erlang", "spawn_monitor")
fn spawn_monitor(running f: fn() -> any) -> #(Pid, Reference)

/// Spawn a task process that calls a given function in order to perform some
/// work. The result of this function is send back to the parent and can be
/// received using the `await` function.
///
/// See the top level module documentation for more information on async/await.
///
pub fn async(work: fn() -> value) -> Task(value) {
  let owner = process.self()
  let subject = process.new_subject()
  let #(pid, ref) =
    spawn_monitor(running: fn() { process.send(subject, work()) })
  let monitor = process.process_monitor_from_ref(ref)
  let selector =
    process.new_selector()
    |> process.selecting_process_down(monitor, FromMonitor)
    |> process.selecting(subject, FromSubject)
  Task(owner: owner, pid: pid, monitor: monitor, selector: selector)
}

corresponding passing test cases:

pub fn try_await_forever_panic_test() {
  // Start with an empty mailbox
  flush()

  // Perform an asynchronous task
  // and monitor it until it's done
  let _result =
    task.async(fn() { panic as "hehe" })
    |> task.try_await_forever
    |> should.be_error()

  // Mailbox should be empty;
  // no "DOWN" message should have been sent
  process.self()
  |> get_message_queue_length
  |> should.equal(0)
}

pub fn try_await_panic_test() {
  // Start with an empty mailbox
  flush()

  // Perform an asynchronous task
  // and monitor it until it's done
  let _result =
    task.async(fn() { panic as "haha" })
    |> task.try_await(1000)
    |> should.be_error()

  // Mailbox should be empty;
  // no "DOWN" message should have been sent
  process.self()
  |> get_message_queue_length
  |> should.equal(0)
}

TanklesXL avatar Nov 14 '23 21:11 TanklesXL

We have decided on Discord to rewrite the Task module using the Elixir one as a reference.

lpil avatar Nov 20 '23 14:11 lpil