prom_ex icon indicating copy to clipboard operation
prom_ex copied to clipboard

[BUG] Ecto dashboard not showing the repo name

Open pallix opened this issue 1 year ago • 17 comments

Describe the bug

When using the ecto plugin, the drop list menu on the dashboard does not show the repo:

grafana

Environment

  • Elixir version: Elixir 1.13.4 (compiled with Erlang/OTP 24)
  • Erlang/OTP version: Erlang/OTP 24
  • [erts-12.2] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:1] [jit]
  • Grafana version: 7.5.4
  • Prometheus version: 2.26.0

pallix avatar Jan 30 '23 13:01 pallix

Did you make sure that the PromEx application is first in the children in application.ex? It will not catch the init events if it does not start before ecto/oban/etc.

wkpatrick avatar Apr 10 '23 23:04 wkpatrick

Did you make sure that the PromEx application is first in the children in application.ex? It will not catch the init events if it does not start before ecto/oban/etc.

Thanks for reaching out. No, it's not always started as first child in our applications. So it needs the init events to find the repo names?

pallix avatar Apr 11 '23 10:04 pallix

Correct, I had the exact same behavior, which I fixed by putting PromEx as the first child to be started.

wkpatrick avatar Apr 12 '23 01:04 wkpatrick

Are we sure about that, @wkpatrick? I am not disagreeing, but I am getting confused by the code:

The telemetry attachment should send the repo tag_values, which is calculated here:

https://github.com/akoutmos/prom_ex/blob/6a002fb029cf00eeadfcfef7ce984cd7a0f83493/lib/prom_ex/plugins/ecto.ex#L246

What could happen is that you have yet to receive any information; therefore, that list is empty.

But I don't see any requirement about the order of the processes.

You do need a :ecto_repos configuration under your otp_app: https://github.com/akoutmos/prom_ex/blob/6a002fb029cf00eeadfcfef7ce984cd7a0f83493/lib/prom_ex/plugins/ecto.ex#L55 or pass :repos options https://github.com/akoutmos/prom_ex/blob/6a002fb029cf00eeadfcfef7ce984cd7a0f83493/lib/prom_ex/plugins/ecto.ex#L54.

@pallix, please confirm,

  1. You are setting up the repos configuration for the Ecto plugin, or if you are not using the repos configuration, you have a configured ecto_repos under the namespace of the otp_app named pass a configuration.

  2. You ran the application and generated metrics to look at.

yordis avatar Apr 20 '23 04:04 yordis

@yordis

The ecto dashboard parses the repo name out of the init metrics label_values(<repo name>_prom_ex_ecto_repo_init_status_info, repo) Which are event based and defined here: https://github.com/akoutmos/prom_ex/blob/6a002fb029cf00eeadfcfef7ce984cd7a0f83493/lib/prom_ex/plugins/ecto.ex#L74

The event is only fired once when the Repo supervisor starts. Therefore, if the prom_ex application starts after Ecto, it will not have received the telemetry event for init. Which means it does not generate the promethus metric format for the init, so the dashboard is unable to generate the label values for it. https://github.com/elixir-ecto/ecto/blob/05e4669ab28a7559ff5fcc0af7216faf9df7b997/lib/ecto/repo/supervisor.ex#L179

wkpatrick avatar Apr 20 '23 17:04 wkpatrick

Right right, but that is true only for the init, right?! (just confirming) but other metrics should eventually fill up that field, no?

Being said, probably prudent to initiate PromEx before, regardless.

@pallix please confirm that reordering the processes fix the issue.

yordis avatar Apr 20 '23 17:04 yordis

Nope. _prom_ex_ecto_repo_init_status_info is only filled on init. Other metrics contain the repo name, but only init_status_info is parsed to generate the labels in the dashboard.

wkpatrick avatar Apr 20 '23 17:04 wkpatrick

@pallix please confirm that reordering the processes fix the issue.

I will try to test it this week! Thank you.

pallix avatar Apr 23 '23 19:04 pallix

The suggested fix works, than you very much :hugs:

pallix avatar Apr 25 '23 09:04 pallix

@pallix do you mind creating a PR updating the docs, or closing the issue?

yordis avatar Apr 25 '23 18:04 yordis

@pallix do you mind creating a PR updating the docs, or closing the issue?

Shouldn't it be solve at the code level and the order of the children made irrelevant? or it's not practical?

pallix avatar Apr 25 '23 20:04 pallix

🤷🏻 I am not sure. I think we can take baby steps on the topic. The documentation should be enough for now, and I will commit to fixing it. The following person could do that if you don't have time.

@wkpatrick is more knowledgeable here, so maybe he knows some limitations and whatnot.

Give it a try if you want to 🚀

yordis avatar Apr 25 '23 20:04 yordis

Peeps, any updates over here?!

yordis avatar May 05 '23 19:05 yordis

@pallix do you mind creating a PR updating the docs, or closing the issue?

Shouldn't it be solve at the code level and the order of the children made irrelevant? or it's not practical?

Given that the Telemetry events are synchronous and not captured/stored by any other mechanism, the PromEx supervision tree must be started prior to starting the Repo supervision trees or else the Telemetry events will be missed with no way to retroactively get their measurements and metadata. I'll have to look into whether there is a better way to capture Ecto init data (perhaps don't even attach to the init event and just query the repo module directly for the same data), but for now PromEx should come first in the supervision tree. If anyone has some bandwidth, a documentation update would be appreciated!

akoutmos avatar May 06 '23 14:05 akoutmos

@akoutmos do you think we should tell people to start the PromEx supervisor as soon as possible? Even before forming clusters? or what????

yordis avatar May 06 '23 23:05 yordis

I will try to find time for a pull request in the next days

pallix avatar May 08 '23 08:05 pallix

Actually it's already in the doc since two years :-):

(be sure to add it to the top of the supervisor children list so that you do not miss any init-style events from other processes like Ecto.Repo for example)

I suggest the two followings changes to improve the doc:

  • keep the sentence but let it stands on its own, not within parenthesis. This is too important to be in parenthesis.
  • change the following code example to add MyCoolApp.Repo in the supervisor tree, just after the PromEx child. This way we also have an example of what shoud be done.

I will adress both with a PR.

pallix avatar May 08 '23 08:05 pallix