appsignal-elixir
appsignal-elixir copied to clipboard
Track ecto queue time separately
Currently we track the queue time for ecto queries by combining it with the actual query duration here: https://github.com/appsignal/appsignal-elixir/blob/d991f28d083d374a8d8e125e15dc66e3f78a7eca/lib/appsignal/ecto.ex#L29-L31
This means we lose detail here and are not able to show the queue time separately from the query duration.
An idea might be to create a separate event for the ecto queue duration, e.g. queue.ecto.
- queue.ecto - 100ms (100ms - 10ms (query time) = 90ms queue time)
- query.ecto - 10ms
To do
- [ ] Record Ecto events as separate events
- [ ] Potentially update the extension to not nest Ecto events under the query event, because it's not a sub event of the query(?)
https://app.intercom.io/a/apps/yzor8gyw/inbox/inbox/conversation/15300685751
@jeffkreeftmeijer what do you think? Is adding a separate event here a good solution? It is small enough for the issue week?
Not sure about the implementation (how do we do this in the Ruby integration?), but this should be small enough to do in the issue week, yes.
queue duration is not being tracked separately in Ruby for ActiveRecord as ActiveSupport::Notifications doesn't report it separately from what I can tell.
We can separate this up into queue, query and decode. I don't like nesting the query event in the queue event (as that's not what's happening), I'd rather do something like this:
- query.ecto
- queue.ecto
- db.ecto
- decode.ecto
Also, we'd need to nest these events in each other, which we currently don't support using Transaction.record_event/6.
https://app.intercom.io/a/apps/yzor8gyw/inbox/inbox/14888/conversations/19079036874
https://app.intercom.io/a/apps/yzor8gyw/inbox/inbox/unassigned/conversations/23077786339
Let's check if its technically possible in the agent to do this.
@thijsc can you help research if this is possible (with someone involved in Elixir) and if so, implement this in the agent?
Let's discuss with @jeffkreeftmeijer how we can track this in Elixir, Thijs says it's possible to send this to the extension.
This should be a minor change on the Elixir side. Instead of adding all values together, we'd record a separate event for each of them. We then wrap everything in one main query.ecto event (assuming that works on the agent side), to get something that looks like this.
query.ecto
- queue.ecto
- db.ecto
- decode.ecto
Just wondering if there is any progress on this? Not being able to see the database pool queue time really masks potential issues when diagnosing an application slow down.
We're also running in to situations where this info would be really valuable. Using appsignal we can see when queries are slow, but it doesn't go in to detail about what makes this query slow and it can also be misleading e.g. is the query execution time actually slow, or is a particular query prone to be queued for a long time?
https://app.intercom.com/a/apps/yzor8gyw/inbox/inbox/4356044/conversations/16410700011775
Definitely would find this useful
https://app.intercom.com/a/apps/yzor8gyw/inbox/inbox/4356044/conversations/16410700022582
Also here to express how much we want/need this functionality 🙌
As @edds already mentioned, it would help with the diagnosis of our applications. We need to be able to identify bottlenecks with more precision be it a long execution query or an event that leads to growing queue times.
We have an app running that right now is suffering a lot from this and we can't really diagnose with ease the problems. This issue would def help on it! 💪
this would be really useful 🙏
If it helps, a workaround that we used for this was to manually track queue, idle time and query time which we then graphed on a custom dashboard. It wouldn't help when looking at an individual slow query, but did let us spot trends when the database was getting clogged up. It looked something like this:
:telemetry.attach(
"my-app-ecto",
[:my_app, :repo, :query],
&MyApp.EctoTelemetry.handle_event/4,
nil
)
with something along the lines of:
defmodule MyApp.EctoTelemetry do
def handle_event([:my_app, :repo, :query], measurements, _metadata, _config) do
case measurements do
%{queue_time: queue} ->
Appsignal.add_distribution_value("ecto.queue_time", System.convert_time_unit(queue, :native, :millisecond))
_ ->
nil
end
case measurements do
%{idle_time: idle} ->
Appsignal.add_distribution_value("ecto.idle_time", System.convert_time_unit(idle, :native, :millisecond))
_ ->
nil
end
case measurements do
%{query_time: query} ->
Appsignal.add_distribution_value("ecto.query_time", System.convert_time_unit(query, :native, :millisecond))
_ ->
nil
end
end
end
This would also be extremely helpful for us. We're seeing some random outliers with Ecto queries and are not sure if it's a database resource thing or a connection pool thing (or both). Having this available to us in AppSignal would certainly help us diagnose these kinds of problems a whole lot more quickly.
https://app.intercom.com/a/inbox/yzor8gyw/inbox/admin/4356044/conversation/16410700247151 (private link)
@edds many thanks! What you provided will be useful to help us "size" the Ecto pool more easily.