appsignal-elixir icon indicating copy to clipboard operation
appsignal-elixir copied to clipboard

Track ecto queue time separately

Open tombruijn opened this issue 7 years ago • 24 comments

Currently we track the queue time for ecto queries by combining it with the actual query duration here: https://github.com/appsignal/appsignal-elixir/blob/d991f28d083d374a8d8e125e15dc66e3f78a7eca/lib/appsignal/ecto.ex#L29-L31

This means we lose detail here and are not able to show the queue time separately from the query duration.

An idea might be to create a separate event for the ecto queue duration, e.g. queue.ecto.

- queue.ecto - 100ms (100ms - 10ms (query time) = 90ms queue time)
   - query.ecto - 10ms

To do

  • [ ] Record Ecto events as separate events
  • [ ] Potentially update the extension to not nest Ecto events under the query event, because it's not a sub event of the query(?)

tombruijn avatar Mar 21 '18 10:03 tombruijn

https://app.intercom.io/a/apps/yzor8gyw/inbox/inbox/conversation/15300685751

tombruijn avatar Mar 21 '18 11:03 tombruijn

@jeffkreeftmeijer what do you think? Is adding a separate event here a good solution? It is small enough for the issue week?

tombruijn avatar May 30 '18 15:05 tombruijn

Not sure about the implementation (how do we do this in the Ruby integration?), but this should be small enough to do in the issue week, yes.

jeffkreeftmeijer avatar May 30 '18 15:05 jeffkreeftmeijer

queue duration is not being tracked separately in Ruby for ActiveRecord as ActiveSupport::Notifications doesn't report it separately from what I can tell.

tombruijn avatar May 30 '18 15:05 tombruijn

We can separate this up into queue, query and decode. I don't like nesting the query event in the queue event (as that's not what's happening), I'd rather do something like this:

  • query.ecto
    • queue.ecto
    • db.ecto
    • decode.ecto

Also, we'd need to nest these events in each other, which we currently don't support using Transaction.record_event/6.

jeffkreeftmeijer avatar Jun 13 '18 12:06 jeffkreeftmeijer

https://app.intercom.io/a/apps/yzor8gyw/inbox/inbox/14888/conversations/19079036874

thijsc avatar Oct 17 '18 08:10 thijsc

https://app.intercom.io/a/apps/yzor8gyw/inbox/inbox/unassigned/conversations/23077786339

jeffkreeftmeijer avatar Aug 01 '19 08:08 jeffkreeftmeijer

Let's check if its technically possible in the agent to do this.

tombruijn avatar Aug 06 '19 11:08 tombruijn

@thijsc can you help research if this is possible (with someone involved in Elixir) and if so, implement this in the agent?

wesoudshoorn avatar Sep 24 '19 13:09 wesoudshoorn

Let's discuss with @jeffkreeftmeijer how we can track this in Elixir, Thijs says it's possible to send this to the extension.

tombruijn avatar Nov 18 '19 10:11 tombruijn

This should be a minor change on the Elixir side. Instead of adding all values together, we'd record a separate event for each of them. We then wrap everything in one main query.ecto event (assuming that works on the agent side), to get something that looks like this.

query.ecto
- queue.ecto
- db.ecto
- decode.ecto

jeffkreeftmeijer avatar Nov 18 '19 15:11 jeffkreeftmeijer

Just wondering if there is any progress on this? Not being able to see the database pool queue time really masks potential issues when diagnosing an application slow down.

edds avatar Jul 20 '20 09:07 edds

We're also running in to situations where this info would be really valuable. Using appsignal we can see when queries are slow, but it doesn't go in to detail about what makes this query slow and it can also be misleading e.g. is the query execution time actually slow, or is a particular query prone to be queued for a long time?

aboxshall avatar Sep 28 '20 21:09 aboxshall

https://app.intercom.com/a/apps/yzor8gyw/inbox/inbox/4356044/conversations/16410700011775

shairyar avatar Oct 02 '20 08:10 shairyar

Definitely would find this useful

jay-meister avatar Dec 03 '20 12:12 jay-meister

https://app.intercom.com/a/apps/yzor8gyw/inbox/inbox/4356044/conversations/16410700022582

shairyar avatar Jan 12 '21 14:01 shairyar

Also here to express how much we want/need this functionality 🙌

As @edds already mentioned, it would help with the diagnosis of our applications. We need to be able to identify bottlenecks with more precision be it a long execution query or an event that leads to growing queue times.

andreragsilva avatar May 14 '21 08:05 andreragsilva

We have an app running that right now is suffering a lot from this and we can't really diagnose with ease the problems. This issue would def help on it! 💪

nbernardes avatar May 14 '21 08:05 nbernardes

this would be really useful 🙏

marinho10 avatar May 14 '21 08:05 marinho10

If it helps, a workaround that we used for this was to manually track queue, idle time and query time which we then graphed on a custom dashboard. It wouldn't help when looking at an individual slow query, but did let us spot trends when the database was getting clogged up. It looked something like this:

      :telemetry.attach(
        "my-app-ecto",
        [:my_app, :repo, :query],
        &MyApp.EctoTelemetry.handle_event/4,
        nil
      )

with something along the lines of:

defmodule MyApp.EctoTelemetry do
  def handle_event([:my_app, :repo, :query], measurements, _metadata, _config) do
    case measurements do
      %{queue_time: queue} ->
        Appsignal.add_distribution_value("ecto.queue_time", System.convert_time_unit(queue, :native, :millisecond))

      _ ->
        nil
    end

    case measurements do
      %{idle_time: idle} ->
        Appsignal.add_distribution_value("ecto.idle_time", System.convert_time_unit(idle, :native, :millisecond))

      _ ->
        nil
    end

    case measurements do
      %{query_time: query} ->
        Appsignal.add_distribution_value("ecto.query_time", System.convert_time_unit(query, :native, :millisecond))

      _ ->
        nil
    end
  end
end

edds avatar May 14 '21 10:05 edds

This would also be extremely helpful for us. We're seeing some random outliers with Ecto queries and are not sure if it's a database resource thing or a connection pool thing (or both). Having this available to us in AppSignal would certainly help us diagnose these kinds of problems a whole lot more quickly.

matthewlehner avatar Nov 23 '21 17:11 matthewlehner

https://app.intercom.com/a/inbox/yzor8gyw/inbox/admin/4356044/conversation/16410700247151 (private link)

shairyar avatar Sep 05 '23 08:09 shairyar

@edds many thanks! What you provided will be useful to help us "size" the Ecto pool more easily.

thbar avatar Sep 05 '23 09:09 thbar