absinthe icon indicating copy to clipboard operation
absinthe copied to clipboard

Allow async timeouts to be rescued and an error inserted for resolution

Open blatyo opened this issue 3 years ago • 11 comments

Environment

  • Elixir version (elixir -v):

    Erlang/OTP 23 [erts-11.1.1] [source] [64-bit] [smp:12:12] [ds:12:12:10] [async-threads:1] [hipe]
    
    Elixir 1.11.1 (compiled with Erlang/OTP 23)
    
  • Absinthe version (mix deps | grep absinthe):

    • absinthe 1.5.3 (Hex package) (mix) locked at 1.5.3 (absinthe) 69a170f3
    • absinthe_plug 1.5.0 (Hex package) (mix) locked at 1.5.0 (absinthe_plug) 4c160f4c
  • Client Framework and version (Relay, Apollo, etc):

    • @apollo/client: 3.2.9

Expected behavior

Async time outs cause the field being resolved to have an error. Additionally, this error should be reported to telemetry.

Actual behavior

Async time out sends an exit signal that kills the request and causes a 500.

Request: POST /
** (exit) exited in: Task.await(%Task{owner: #PID<0.10463.1>, pid: #PID<0.10479.1>, ref: #Reference<0.4029714311.3707240457.87870>}, 30000)
    ** (EXIT) time out

Relevant Schema/Middleware Code

https://github.com/absinthe-graphql/absinthe/blob/master/lib/absinthe/middleware/async.ex#L80

blatyo avatar Nov 05 '21 14:11 blatyo

This is something that's been affecting us for quite some time, as we use Absinthe as a mobile BFF. We worked around it by explicitly using Task.async_stream in our resolvers and providing "graceful" fallback responses when timeouts occur.

Would a PR be accepted that uses Task.async_stream instead of Task.await for this middleware?

Catharz avatar Nov 17 '21 22:11 Catharz

No this is not a good fit for Task.async_stream, there isn't an enumerable to feed to it. Instead we could use a combo of Task.yield + Task.shutdown.

benwilson512 avatar Nov 24 '21 16:11 benwilson512

No this is not a good fit for Task.async_stream, there isn't an enumerable to feed to it. Instead we could use a combo of Task.yield + Task.shutdown.

Thanks for the feedback @benwilson512. I had started work on this (accumulating a list of fields to be resolved async), but will pivot to using ☝️ .

Catharz avatar Nov 24 '21 22:11 Catharz

Likely this should be opt-in since it'd be a pretty meaningful behavior change for everyone - this behavior has been around for a long time...

The batch plugin should likely implement this as well if we are doing async.

FYI, I have an implementation of this pattern for async and batch laying around in a project or two, I can try and dig it out soon...

binaryseed avatar Nov 24 '21 22:11 binaryseed

Likely this should be opt-in since it'd be a pretty meaningful behavior change for everyone - this behavior has been around for a long time...

I aim to have this be backwards compatible. I don't want to break anybody else's code.

The batch plugin should likely implement this as well if we are doing async.

I'll take a look at that too.

FYI, I have an implementation of this pattern for async and batch laying around in a project or two, I can try and dig it out soon...

I'd be very interested to see how you handled this.

Catharz avatar Nov 24 '21 23:11 Catharz

It's not too complicated but there are decisions to be made about what kind of error to resolve & how to roll this out

https://github.com/absinthe-graphql/absinthe/compare/master...binaryseed:async-batch-error-handling?expand=1

binaryseed avatar Nov 25 '21 04:11 binaryseed

I ended up with something that looks a little simpler. Not sure what I might be missing:

  def call(%{state: :suspended} = res, {task, [{:fallback, fallback}]}),
    do: call(res, {task, [{:timeout, 30_000}, {:fallback, fallback}]})

  def call(%{state: :suspended} = res, {task, opts = [{:timeout, _}, {:fallback, _}]}) do
    result =
      case Task.yield(task, opts[:timeout] || 30_000) || Task.shutdown(task) do
        {:ok, result} ->
          result

        nil ->
          opts[:fallback]
      end

    res
    |> Absinthe.Resolution.put_result(result)
  end

I've tried to make sure it works with the :timeout, :fallback, both or neither (without changing the existing behaviour). In our case, we actually don't want an error. A timeout means some service didn't respond, but we still want a "valid" response sent to our clients. That's typically a map, a list or a boolean.

I'll be looking at the batch part tomorrow, and will submit a PR after that. I have no idea how to test the timeouts or fallback in a nice way though. 😞

Catharz avatar Nov 25 '21 05:11 Catharz

It's not too complicated but there are decisions to be made about what kind of error to resolve & how to roll this out

https://github.com/absinthe-graphql/absinthe/compare/master...binaryseed:async-batch-error-handling?expand=1

That's fairly close to what I want, but the problem is it returns errors when tasks timeout. Timeouts are expected in our environment, so what we're after is a graceful fallback when they occur.

Thinking about it: it's maybe not idiomatic to return a fallback instead of an :error tuple? But the :error tuple doesn't suit our needs. So, does this actually belong in Absinthe, or should we just write our own middleware implementation that does it the way we require?

Catharz avatar Nov 26 '21 03:11 Catharz

Yes, I don't know that a hard-coded fallback would be the best general solution here, I'm sure different use cases would call for more dynamic responses. In our implementation, we had a bunch of logic we used to determine what error to show based on what kind of user was present.

And code-wise, Task.yield can return an {:exit, term()} tuple in the case when the Task process itself exits... That would need be handled.

It's pretty straightforward to copy in the absinthe Async implementation and modify it however you want, so if that works for you I'd recommend it...

binaryseed avatar Nov 26 '21 23:11 binaryseed

Yes, I don't know that a hard-coded fallback would be the best general solution here, I'm sure different use cases would call for more dynamic responses. In our implementation, we had a bunch of logic we used to determine what error to show based on what kind of user was present.

I was going for a :fallback option, rather than something hard-coded.

So if you wanted to use it, it'd be something like this:

  field :time_consuming, :thing do
    resolve fn _, _, _ ->
      async(fn ->
        {:ok, long_time_consuming_function()}
      end, fallback: %{})
    end
  end

with both timeout and fallback being optional.

The way I implemented it (to maintain the existing behaviour) is:

  def call(%{state: :suspended} = res, {task, [{:fallback, fallback}]}),
    do: call(res, {task, [{:timeout, 30_000}, {:fallback, fallback}]})

  def call(%{state: :suspended} = res, {task, opts = [{:timeout, _}, {:fallback, _}]}) do
    result =
      case Task.yield(task, opts[:timeout] || 30_000) || Task.shutdown(task) do
        {:ok, result} ->
          result

        nil ->
          opts[:fallback]
      end

    res
    |> Absinthe.Resolution.put_result(result)
  end

  def call(%{state: :suspended} = res, {task, opts}) do
    result = Task.await(task, opts[:timeout] || 30_000)

    res
    |> Absinthe.Resolution.put_result(result)
  end

Catharz avatar Nov 29 '21 22:11 Catharz

Is there any update on this? I would love to have a cleaner way of handling timeouts in our resolvers. We've been using batch more frequently and unfortunately in most cases the default 5s timeout can be exceeded - leaving us with a 500 crash and no helpful error logging to see which batch resolver is struggling.

rossvz avatar Jun 14 '22 15:06 rossvz