prom_ex icon indicating copy to clipboard operation
prom_ex copied to clipboard

[BUG] ecto plugin prefix doesn't match default dashboard in umbrella application

Open doughsay opened this issue 3 years ago • 16 comments

Describe the bug I'm in an umbrella application with several sub-applications each with their own ecto repo, so I'm trying to start the prom_ex ecto plugin for one of them, like this:

{Plugins.Ecto, otp_app: :core}

This generates prometheus metrics like this:

core_prom_ex_ecto_repo_query_execution_time_milliseconds_sum{command="select",repo="Core.Repo",source="core_tickets"} 17.019069

But the provided default ecto grafana dashboard looks like it's using metric names with the prefect_ prefix, e.g.:

sum(irate(prefect_prom_ex_ecto_repo_query_execution_time_milliseconds_sum{job=\"$job\", instance=\"$instance\", repo=\"$repo\"}[$interval])) by(command) / sum(irate(prefect_prom_ex_ecto_repo_query_execution_time_milliseconds_count{job=\"$job\", instance=\"$instance\", repo=\"$repo\"}[$interval])) by(command)
prefect_prom_ex_ecto_repo_query_execution_time_milliseconds_sum
!=
core_prom_ex_ecto_repo_query_execution_time_milliseconds_sum

prefect is the name of my monitoring application, core is the name of another app in the same umbrella.

I want to eventually be able to start multiple of these plugins:

{Plugins.Ecto, otp_app: :core},
{Plugins.Ecto, otp_app: :ops},
{Plugins.Ecto, otp_app: :commerce}

It looks like the grafana dashboard can handle multiple repos, it just doesn't expect them to all be handled by different otp_apps. The dashboard being uploaded is making the implicit assumption that the app running prom_ex is the same app that runs the ecto repo. What can we do about this? Would love to get this working! 😄

To Reproduce Steps to reproduce the behavior:

  1. Create an umbrella application with two apps, e.g.: my_monitor + my_app.
  2. Install ecto into my_app and prom_ex into my_monitor
  3. Attempt to use the prom_ex ecto plugin to monitor the ecto repo in my_app.

Expected behavior I'm not sure what to expect yet, I think I need to think about it more...

Environment

  • Elixir version: 1.11.3
  • Erlang/OTP version: 23.2.5
  • Grafana version: v7.4.2 (29e75ad97b)
  • Prometheus version: not exactly sure, it's the official docker image prometheus:latest

Additional context A lot of what I've tried so far is working great. The BEAM plugin + dashboard is working, the phoenix plugin + dashboard is working (even though my phoenix app is in yet another otp_app in my umbrella). It's just the ecto plugin making this bad assumption. I have not yet tried oban, but I'm worried about that one too, we have two separate oban installations inside our umbrella as well.

doughsay avatar Feb 27 '21 01:02 doughsay

Hey Chris!

There are actually some undocumented features in PromEx that allow you to do this. I need to tidy up that API prior to exposing that functionality though. I should be able to wrap this up by middle-end of next week :).

akoutmos avatar Feb 27 '21 02:02 akoutmos

If it is not too much trouble @doughsay, would you mind taking branch supporting_multiple_plugin_definitions for a test drive (PR https://github.com/akoutmos/prom_ex/pull/32)? I don't have a sample umbrella application with multiples repos that i can test against on hand and could use the additional validation.

Your configuration should look something like so:

@impl true
def plugins do
  [
    {Plugins.Ecto, otp_app: :core},
    {Plugins.Ecto, otp_app: :ops},
    {Plugins.Ecto, otp_app: :commerce}
  ]
end

@impl true
def dashboards do
  [
    {:prom_ex, "ecto.json", otp_app: :core, title: "Core - PromEx Ecto Dashboard"},
    {:prom_ex, "ecto.json", otp_app: :ops, title: "Ops - PromEx Ecto Dashboard"},
    {:prom_ex, "ecto.json", otp_app: :commerce, title: "Commerce - PromEx Ecto Dashboard"}
  ]
end

akoutmos avatar Feb 28 '21 22:02 akoutmos

This looks great! I will try and get to it today and let you know, thanks!

doughsay avatar Mar 01 '21 18:03 doughsay

Thanks @doughsay! Appreciate the speedy testing :D

akoutmos avatar Mar 01 '21 18:03 akoutmos

Hey! I managed to try this out tonight, and here are my findings:

The grafana dashboard isn't / wasn't working because it isn't using the right queries to find the variables:

  • e.g.: there is no ops_prom_ex_prom_ex_status_info, because the otp_app running prom_ex is prefect, not ops (so I changed these to prefect_prom_ex_prom_ex_status_info)
  • e.g.: there is no ops_prom_ex_ecto_repo_init_status_info in my metrics output, not sure where this is supposed to come from. I changed this to just any old query that returned a set of repos. (this is also why the smaller boxes on the dashboard are all "no data". where am I supposed to be getting this from?)

DeepinScreenshot_select-area_20210301183346

After that it "worked":

DeepinScreenshot_select-area_20210301181434

...but as you can see, all the repos are being reported for each plugin instance. Not sure why this is, because each of my applications is configured to only use one repo each:

iex([email protected])2> Application.get_env(:core, :ecto_repos)
[Core.Repo]
iex([email protected])3> Application.get_env(:ops, :ecto_repos)
[Ops.Repo]

I can go either way on having multiple dashboards vs a single dashboard with a repo drop-down, it doesn't matter to me which way it works. But my guess is that the intent is that for a given prom_ex ecto plugin, it should correspond to one grafana dashboard, and IF that one plugin collects events for multiple repos (i.e. because the otp_app is actually configured with multiple repos) then the drop-down comes into play. In my case, this isn't the case, I have a one-to-one mapping of apps to repos.

Anyway... if you think it would help, I was thinking of putting together an example umbrella app with a similar setup to how we have it so we can work on getting all of prom_ex working using that sample instead of going back and forth on my actual application from work... Would that help?

doughsay avatar Mar 02 '21 02:03 doughsay

Sorry that that didn't work out as I expected :/. If you wouldn't mind putting together a sample umbrella app for me to test against, that would probably be the best way to root cause and fix this one.

akoutmos avatar Mar 02 '21 04:03 akoutmos

Hey @doughsay! Any luck on getting a sample app put together for this?

akoutmos avatar Mar 14 '21 22:03 akoutmos

It just so happens I'm working on it today! It's not quite done yet, but I'll let you know as soon as it is 😁

doughsay avatar Mar 14 '21 22:03 doughsay

Awesome! Appreciate you putting that together :).

akoutmos avatar Mar 14 '21 22:03 akoutmos

Hey @akoutmos,

I've run out of time for today, but I have something that at least seems to expose a few problems so far: https://github.com/doughsay/prom_ex_umbrella_example

It's a bit rough around the edges, but I hope you can get it running. I hacked in a quick ecto query by visiting the index page of the phoenix app "WebOne", just to get ecto doing something basic.

Things of note:

  • The phoenix metrics for both phoenix apps appear to be combined, even though I've defined separate instances of the phoenix plugin. Maybe I can fix this by giving each endpoint a unique event prefix?
  • The phoenix metrics all begin with prom_metrics_prom_ex_phoenix_, even though the otp app is not prom_metrics. Can I fix this with passing in otp_app: ... in the plugin?
  • The ecto plugin seems to create metrics for all ecto repos in the umbrella, even when explicitly given an otp_app options. e.g.: app_two_prom_ex_ecto_repo_* prefixes are exporting metrics with label repo="AppOne.Repo".

I hope to clarify things a bit more when I have more time, but I hope this at least gets us started.

doughsay avatar Mar 15 '21 01:03 doughsay

Awesome work! I'll take a look at the repro project and get back to you with some ideas. Thanks :)

akoutmos avatar Mar 15 '21 18:03 akoutmos

Just FYI, I realized my instructions for getting things running are missing a step: you have to run the ecto migrations and db creation using mix, but the config files are set up to only work while running inside docker. The quickest workaround is:

  • edit the config/dev.exs file to change the db hostname from postgres to localhost
  • run mix ecto.create and mix ecto.migrate
  • edit the config file back
  • run in docker

...or I guess maybe you can docker exec ... into the running container and run the mix tasks in there? I dunno, I hope you can figure it out! 😅

doughsay avatar Mar 15 '21 18:03 doughsay

I should be able to get to this early next week. Started digging in and I think I have a plan as to how to move forward.

akoutmos avatar Mar 20 '21 03:03 akoutmos

Just bumping this one so you don't think that I forgot about it :P. I've played around with a couple different ideas as to how this can cleanly be supported. Shooting to have something reviewable by next week!

akoutmos avatar Apr 01 '21 20:04 akoutmos

@akoutmos thanks for the update! I myself have been super busy lately and was not able to get back to rounding out that example app more past just the bare bones I sent you... Is what's there good enough to fully demonstrate the problem? Or could you still benefit from me adding more examples to the example app?

Let me know and I'll see if I can many get some more work done on it soon!

doughsay avatar Apr 01 '21 21:04 doughsay

Appreciate it! The example you provided is more than sufficient. I'm working on also integrating that into the example applications directory as that will be a good E2E test app.

akoutmos avatar Apr 01 '21 21:04 akoutmos