quarkus icon indicating copy to clipboard operation
quarkus copied to clipboard

LGTM Quarkus Dashboard

Open melloware opened this issue 1 year ago • 1 comments

Fix #40933: LGTM Quarkus Dashboard

cc @brunobat @alesj

  • Adds Quarkus Micrometer Dashboard including JVM, HTTP, JDBC and other stats provided by Micrometer.
  • I just need some guidance on the TODO how to get those values properly

image

melloware avatar Jun 17 '24 20:06 melloware

OK I resolved that issue

melloware avatar Jun 18 '24 13:06 melloware


:waning_crescent_moon: This workflow status is outdated as a new workflow run has been triggered.


Status for workflow Quarkus CI

This is the status report for running Quarkus CI on commit 1c41146479bd1bb7432f3a3363655128b9b45642.

:white_check_mark: The latest workflow run for the pull request has completed successfully.

It should be safe to merge provided you have a look at the other checks in the summary.

You can consult the Develocity build scans.

quarkus-bot[bot] avatar Jul 05 '24 11:07 quarkus-bot[bot]


:waning_crescent_moon: This workflow status is outdated as a new workflow run has been triggered.


Status for workflow Quarkus CI

This is the status report for running Quarkus CI on commit 0b8c01b615c8b72894b613d7640f06b6a0b348e9.

:white_check_mark: The latest workflow run for the pull request has completed successfully.

It should be safe to merge provided you have a look at the other checks in the summary.

You can consult the Develocity build scans.

quarkus-bot[bot] avatar Aug 13 '24 16:08 quarkus-bot[bot]

@melloware, back from vacations. I wonder if we could deploy this dashboard on the grafana repository, as well. Will need some time to review this.

brunobat avatar Aug 16 '24 10:08 brunobat

yeah i put this here because i wasn't sure if we wanted to get feedback from other Quarkus Users if there was anything missing they want on there?

melloware avatar Aug 16 '24 12:08 melloware

i also think the colors and times for uptimes might need to be tweaked but i thought it was a pretty good first stab 😄

melloware avatar Aug 16 '24 12:08 melloware


:waning_crescent_moon: This workflow status is outdated as a new workflow run has been triggered.


Status for workflow Quarkus CI

This is the status report for running Quarkus CI on commit 1b6fca0d07832af963d637dc9bfbbac907227961.

:white_check_mark: The latest workflow run for the pull request has completed successfully.

It should be safe to merge provided you have a look at the other checks in the summary.

You can consult the Develocity build scans.

quarkus-bot[bot] avatar Aug 20 '24 12:08 quarkus-bot[bot]


:waning_crescent_moon: This workflow status is outdated as a new workflow run has been triggered.


Status for workflow Quarkus CI

This is the status report for running Quarkus CI on commit 11ede0d504ba36578a4f53cd2f2b99492f9b46a4.

:white_check_mark: The latest workflow run for the pull request has completed successfully.

It should be safe to merge provided you have a look at the other checks in the summary.

You can consult the Develocity build scans.

quarkus-bot[bot] avatar Sep 16 '24 17:09 quarkus-bot[bot]

Also, we should pull up the HTTP parts.

brunobat avatar Sep 17 '24 10:09 brunobat

Yes this was more of an idea, and we need to check the dashboard for startup time those eventually fill in if you leave the app running.

melloware avatar Sep 17 '24 11:09 melloware

@brunobat ok i updated to move HTTP Endpoints up. Here is what mine looks like not sure why yours shows N/A at first nuless you are not wating 30 seconds for the scrape?

image

melloware avatar Sep 17 '24 12:09 melloware

@melloware strange... I waited a few minutes. What app are you using to feed data to the dashboard? I'd like to take a look at the dependencies.

brunobat avatar Sep 17 '24 13:09 brunobat

Hmm i was just using a REST app with...

<dependency>
			<groupId>io.quarkus</groupId>
			<artifactId>quarkus-rest-jackson</artifactId>
		</dependency>
<dependency>
    <groupId>io.quarkus</groupId>
    <artifactId>quarkus-observability-devservices-lgtm</artifactId>
    <scope>provided</scope>
</dependency>
<dependency>
			<groupId>io.quarkus</groupId>
			<artifactId>quarkus-opentelemetry</artifactId>
		</dependency>
		<dependency>
			<groupId>io.opentelemetry.instrumentation</groupId>
			<artifactId>opentelemetry-jdbc</artifactId>
		</dependency>
		<dependency>
			<groupId>io.quarkus</groupId>
			<artifactId>quarkus-micrometer-registry-prometheus</artifactId>
		</dependency>

melloware avatar Sep 17 '24 13:09 melloware

I understand what's going on now... You have configured the prometheus scraper in the PR and I'm using the Micrometer OTLP registry that pushes things. I think it makes sense to have both setups, however the visualizations will be different.

I wonder if the scraper could be activated by a property. I'm not sure if it should be on or off by default. Ideally, all the output, in the future, should be OTLP (OpenTelemetry).

brunobat avatar Sep 17 '24 13:09 brunobat

its just checking the Micrometer metric "expr": "process_start_time_seconds * 1000" which is exposed in the Prometheus metrics.

{
          "datasource": {
            "type": "prometheus",
            "uid": "prometheus"
          },
          "expr": "process_start_time_seconds * 1000",
          "format": "time_series",
          "intervalFactor": 2,
          "legendFormat": "",
          "metric": "",
          "refId": "A",
          "step": 14400
        }

melloware avatar Sep 17 '24 13:09 melloware


:waning_crescent_moon: This workflow status is outdated as a new workflow run has been triggered.


Status for workflow Quarkus CI

This is the status report for running Quarkus CI on commit fcd314f5bba941efd580025629a23667a3e2bd5a.

:white_check_mark: The latest workflow run for the pull request has completed successfully.

It should be safe to merge provided you have a look at the other checks in the summary.

You can consult the Develocity build scans.

quarkus-bot[bot] avatar Sep 17 '24 13:09 quarkus-bot[bot]

One solution can be the least common denominator and only show metrics available with both registries on the dashboard. What do you think @melloware ?

brunobat avatar Sep 19 '24 07:09 brunobat

We need this!

edeandrea avatar Sep 26 '24 17:09 edeandrea

There are 3 types of problems related with missing metrics, metric names and metrics that are more relevant than others. The differences are so big that it's probably better to add 2 dashboards, one for Prometheus and another for OTLP.

I added a commit with an OTLP dashboard. We need to solve some issues detailed bellow.

Missing metrics on OTLP:

  • process.memory
  • jvm_gc_pause_milliseconds_count

Metrics with different names but fixed on the new dashboard:

Component Prometheus metric OTel metric
Uptime process_uptime_seconds{} process_uptime_milliseconds{} /1000
Total number of requests http_server_requests_seconds_count http_server_requests_milliseconds_count
Average inbound request duration rate(http_server_requests_seconds_sum{ }[4m]) / rate(http_server_requests_seconds_count{ }[4m]) rate(http_server_requests_milliseconds_sum{ }[4m]) / rate(http_server_requests_milliseconds_count{ }[4m]) / 1000
Maximum inbound request duration http_server_requests_seconds_max max(http_server_requests_milliseconds_bucket) / 1000
Sum of the duration of every request rate(http_server_requests_seconds_sum{ }[2m]) rate(http_server_requests_milliseconds_sum{ }[2m]) / 1000
JVM Process Memory process_memory_vss_bytes{ } NA
Thread States jvm_threads_states_threads{ } jvm_threads_states{ }
Threads composed query with jvm_threads_*_threads and process_threads metrics follow jvm_threads_* and there' s no process.threads
... ... ...

Long story short:

Component Prometheus metric OTel metric
Temporal metrics seconds miliseconds
Threads jvm_threads_*_threads jvm_threads_*
Classes jvm_classes_*_classes jvm_classes_*
cell cell cell

Please note that the queries for OTLP need double-checking.

Component relevance:

  • Having the max request is not very useful and I left them blanc... I would rather try to display percentils: 90%, 99% and 99.9%
  • The I/O overview is just a repetition on the HTTP panels.
  • We should probably display a total error count or rate with 4** and 5** requests

brunobat avatar Sep 27 '24 17:09 brunobat

@brunobat 👍🏻 👍🏻 ⭐

melloware avatar Sep 27 '24 17:09 melloware

Created an issue on the Grafana side to make the dashboards configurable... I had to do a hack to make them available.

brunobat avatar Sep 27 '24 17:09 brunobat

@gastaldi @alesj can one of you please do an independent review? Thanks!

brunobat avatar Sep 30 '24 08:09 brunobat

Folks, can we get this one in sooner rather than later please? The first iteration doesn't need to be picture perfect, we can always improve given how often Quarkus releases.

geoand avatar Sep 30 '24 11:09 geoand

If anyone could point me to some documentation on how to enable this, it would help :)

gastaldi avatar Sep 30 '24 12:09 gastaldi

https://quarkus.io/guides/observability-devservices-lgtm ?

edeandrea avatar Sep 30 '24 12:09 edeandrea

I followed the guide above and had trouble finding the dashboards on the main page. After some digging, I navigated to http://localhost:43915/dashboards and found them there. Can we star them so they show up on the main page? I think that would be a much better experience.

On a separate note, I noticed that quarkus create app --extension=rest-jackson,quarkus-observability-devservices-lgtm fails with

[ERROR] ❗  Cannot find a dependency matching 'quarkus-observability-devservices-lgtm', maybe a typo?
[ERROR] ❗  Unable to create project: Failed to create project because of invalid extensions

gastaldi avatar Sep 30 '24 12:09 gastaldi

Thanks for testing @gastaldi

melloware avatar Sep 30 '24 13:09 melloware

Created issue with follow up work: https://github.com/quarkusio/quarkus/issues/43599

brunobat avatar Sep 30 '24 14:09 brunobat

Nice work all!

alesj avatar Sep 30 '24 20:09 alesj

Would you share a blog post or video where you show the whole thing going alive ?

adriens avatar Oct 30 '24 22:10 adriens