Fix #40933: LGTM Quarkus Dashboard

cc @brunobat @alesj

Adds Quarkus Micrometer Dashboard including JVM, HTTP, JDBC and other stats provided by Micrometer.
I just need some guidance on the TODO how to get those values properly

Jun 17 '24 20:06 melloware

OK I resolved that issue

Jun 18 '24 13:06 melloware

:waning_crescent_moon: This workflow status is outdated as a new workflow run has been triggered.

Status for workflow `Quarkus CI`

This is the status report for running Quarkus CI on commit 1c41146479bd1bb7432f3a3363655128b9b45642.

:white_check_mark: The latest workflow run for the pull request has completed successfully.

It should be safe to merge provided you have a look at the other checks in the summary.

You can consult the Develocity build scans.

Jul 05 '24 11:07 quarkus-bot[bot]

:waning_crescent_moon: This workflow status is outdated as a new workflow run has been triggered.

Status for workflow `Quarkus CI`

This is the status report for running Quarkus CI on commit 0b8c01b615c8b72894b613d7640f06b6a0b348e9.

:white_check_mark: The latest workflow run for the pull request has completed successfully.

It should be safe to merge provided you have a look at the other checks in the summary.

You can consult the Develocity build scans.

Aug 13 '24 16:08 quarkus-bot[bot]

@melloware, back from vacations. I wonder if we could deploy this dashboard on the grafana repository, as well. Will need some time to review this.

Aug 16 '24 10:08 brunobat

yeah i put this here because i wasn't sure if we wanted to get feedback from other Quarkus Users if there was anything missing they want on there?

Aug 16 '24 12:08 melloware

i also think the colors and times for uptimes might need to be tweaked but i thought it was a pretty good first stab 😄

Aug 16 '24 12:08 melloware

:waning_crescent_moon: This workflow status is outdated as a new workflow run has been triggered.

Status for workflow `Quarkus CI`

This is the status report for running Quarkus CI on commit 1b6fca0d07832af963d637dc9bfbbac907227961.

:white_check_mark: The latest workflow run for the pull request has completed successfully.

It should be safe to merge provided you have a look at the other checks in the summary.

You can consult the Develocity build scans.

Aug 20 '24 12:08 quarkus-bot[bot]

:waning_crescent_moon: This workflow status is outdated as a new workflow run has been triggered.

Status for workflow `Quarkus CI`

This is the status report for running Quarkus CI on commit 11ede0d504ba36578a4f53cd2f2b99492f9b46a4.

:white_check_mark: The latest workflow run for the pull request has completed successfully.

It should be safe to merge provided you have a look at the other checks in the summary.

You can consult the Develocity build scans.

Sep 16 '24 17:09 quarkus-bot[bot]

Also, we should pull up the HTTP parts.

Sep 17 '24 10:09 brunobat

Yes this was more of an idea, and we need to check the dashboard for startup time those eventually fill in if you leave the app running.

Sep 17 '24 11:09 melloware

@brunobat ok i updated to move HTTP Endpoints up. Here is what mine looks like not sure why yours shows N/A at first nuless you are not wating 30 seconds for the scrape?

Sep 17 '24 12:09 melloware

@melloware strange... I waited a few minutes. What app are you using to feed data to the dashboard? I'd like to take a look at the dependencies.

Sep 17 '24 13:09 brunobat

Hmm i was just using a REST app with...

<dependency>
			<groupId>io.quarkus</groupId>
			<artifactId>quarkus-rest-jackson</artifactId>
		</dependency>
<dependency>
    <groupId>io.quarkus</groupId>
    <artifactId>quarkus-observability-devservices-lgtm</artifactId>
    <scope>provided</scope>
</dependency>
<dependency>
			<groupId>io.quarkus</groupId>
			<artifactId>quarkus-opentelemetry</artifactId>
		</dependency>
		<dependency>
			<groupId>io.opentelemetry.instrumentation</groupId>
			<artifactId>opentelemetry-jdbc</artifactId>
		</dependency>
		<dependency>
			<groupId>io.quarkus</groupId>
			<artifactId>quarkus-micrometer-registry-prometheus</artifactId>
		</dependency>

Sep 17 '24 13:09 melloware

I understand what's going on now... You have configured the prometheus scraper in the PR and I'm using the Micrometer OTLP registry that pushes things. I think it makes sense to have both setups, however the visualizations will be different.

I wonder if the scraper could be activated by a property. I'm not sure if it should be on or off by default. Ideally, all the output, in the future, should be OTLP (OpenTelemetry).

Sep 17 '24 13:09 brunobat

its just checking the Micrometer metric "expr": "process_start_time_seconds * 1000" which is exposed in the Prometheus metrics.

{
          "datasource": {
            "type": "prometheus",
            "uid": "prometheus"
          },
          "expr": "process_start_time_seconds * 1000",
          "format": "time_series",
          "intervalFactor": 2,
          "legendFormat": "",
          "metric": "",
          "refId": "A",
          "step": 14400
        }

Sep 17 '24 13:09 melloware

:waning_crescent_moon: This workflow status is outdated as a new workflow run has been triggered.

Status for workflow `Quarkus CI`

This is the status report for running Quarkus CI on commit fcd314f5bba941efd580025629a23667a3e2bd5a.

:white_check_mark: The latest workflow run for the pull request has completed successfully.

It should be safe to merge provided you have a look at the other checks in the summary.

You can consult the Develocity build scans.

Sep 17 '24 13:09 quarkus-bot[bot]

One solution can be the least common denominator and only show metrics available with both registries on the dashboard. What do you think @melloware ?

Sep 19 '24 07:09 brunobat

We need this!

Sep 26 '24 17:09 edeandrea

There are 3 types of problems related with missing metrics, metric names and metrics that are more relevant than others. The differences are so big that it's probably better to add 2 dashboards, one for Prometheus and another for OTLP.

I added a commit with an OTLP dashboard. We need to solve some issues detailed bellow.

Missing metrics on OTLP:

process.memory
jvm_gc_pause_milliseconds_count

Metrics with different names but fixed on the new dashboard:

Component	Prometheus metric	OTel metric
Uptime	process_uptime_seconds{}	process_uptime_milliseconds{} /1000
Total number of requests	http_server_requests_seconds_count	http_server_requests_milliseconds_count
Average inbound request duration	rate(http_server_requests_seconds_sum{ }[4m]) / rate(http_server_requests_seconds_count{ }[4m])	rate(http_server_requests_milliseconds_sum{ }[4m]) / rate(http_server_requests_milliseconds_count{ }[4m]) / 1000
Maximum inbound request duration	http_server_requests_seconds_max	max(http_server_requests_milliseconds_bucket) / 1000
Sum of the duration of every request	rate(http_server_requests_seconds_sum{ }[2m])	rate(http_server_requests_milliseconds_sum{ }[2m]) / 1000
JVM Process Memory	process_memory_vss_bytes{ }	NA
Thread States	jvm_threads_states_threads{ }	jvm_threads_states{ }
Threads	composed query with jvm_threads_*_threads and process_threads	metrics follow jvm_threads_* and there' s no process.threads
...	...	...

Long story short:

Component	Prometheus metric	OTel metric
Temporal metrics	seconds	miliseconds
Threads	jvm_threads_*_threads	jvm_threads_*
Classes	jvm_classes_*_classes	jvm_classes_*
cell	cell	cell

Please note that the queries for OTLP need double-checking.

Component relevance:

Having the max request is not very useful and I left them blanc... I would rather try to display percentils: 90%, 99% and 99.9%
The I/O overview is just a repetition on the HTTP panels.
We should probably display a total error count or rate with 4** and 5** requests

Sep 27 '24 17:09 brunobat

@brunobat 👍🏻 👍🏻 ⭐

Sep 27 '24 17:09 melloware

Created an issue on the Grafana side to make the dashboards configurable... I had to do a hack to make them available.

Sep 27 '24 17:09 brunobat

@gastaldi @alesj can one of you please do an independent review? Thanks!

Sep 30 '24 08:09 brunobat

Folks, can we get this one in sooner rather than later please? The first iteration doesn't need to be picture perfect, we can always improve given how often Quarkus releases.

Sep 30 '24 11:09 geoand

If anyone could point me to some documentation on how to enable this, it would help :)

Sep 30 '24 12:09 gastaldi

https://quarkus.io/guides/observability-devservices-lgtm ?

Sep 30 '24 12:09 edeandrea

I followed the guide above and had trouble finding the dashboards on the main page. After some digging, I navigated to http://localhost:43915/dashboards and found them there. Can we star them so they show up on the main page? I think that would be a much better experience.

On a separate note, I noticed that quarkus create app --extension=rest-jackson,quarkus-observability-devservices-lgtm fails with

[ERROR] ❗  Cannot find a dependency matching 'quarkus-observability-devservices-lgtm', maybe a typo?
[ERROR] ❗  Unable to create project: Failed to create project because of invalid extensions

Sep 30 '24 12:09 gastaldi

Thanks for testing @gastaldi

Sep 30 '24 13:09 melloware

Created issue with follow up work: https://github.com/quarkusio/quarkus/issues/43599

Sep 30 '24 14:09 brunobat

Nice work all!

Sep 30 '24 20:09 alesj

Would you share a blog post or video where you show the whole thing going alive ?

Oct 30 '24 22:10 adriens

quarkus quarkus copied to clipboard

LGTM Quarkus Dashboard

Status for workflow Quarkus CI

Status for workflow Quarkus CI

Status for workflow Quarkus CI

Status for workflow Quarkus CI

Status for workflow Quarkus CI

quarkus
quarkus copied to clipboard

Status for workflow `Quarkus CI`

Status for workflow `Quarkus CI`

Status for workflow `Quarkus CI`

Status for workflow `Quarkus CI`

Status for workflow `Quarkus CI`