quarkus
quarkus copied to clipboard
LGTM Quarkus Dashboard
Fix #40933: LGTM Quarkus Dashboard
cc @brunobat @alesj
- Adds Quarkus Micrometer Dashboard including JVM, HTTP, JDBC and other stats provided by Micrometer.
- I just need some guidance on the TODO how to get those values properly
OK I resolved that issue
:waning_crescent_moon: This workflow status is outdated as a new workflow run has been triggered.
Status for workflow Quarkus CI
This is the status report for running Quarkus CI on commit 1c41146479bd1bb7432f3a3363655128b9b45642.
:white_check_mark: The latest workflow run for the pull request has completed successfully.
It should be safe to merge provided you have a look at the other checks in the summary.
You can consult the Develocity build scans.
:waning_crescent_moon: This workflow status is outdated as a new workflow run has been triggered.
Status for workflow Quarkus CI
This is the status report for running Quarkus CI on commit 0b8c01b615c8b72894b613d7640f06b6a0b348e9.
:white_check_mark: The latest workflow run for the pull request has completed successfully.
It should be safe to merge provided you have a look at the other checks in the summary.
You can consult the Develocity build scans.
@melloware, back from vacations. I wonder if we could deploy this dashboard on the grafana repository, as well. Will need some time to review this.
yeah i put this here because i wasn't sure if we wanted to get feedback from other Quarkus Users if there was anything missing they want on there?
i also think the colors and times for uptimes might need to be tweaked but i thought it was a pretty good first stab 😄
:waning_crescent_moon: This workflow status is outdated as a new workflow run has been triggered.
Status for workflow Quarkus CI
This is the status report for running Quarkus CI on commit 1b6fca0d07832af963d637dc9bfbbac907227961.
:white_check_mark: The latest workflow run for the pull request has completed successfully.
It should be safe to merge provided you have a look at the other checks in the summary.
You can consult the Develocity build scans.
:waning_crescent_moon: This workflow status is outdated as a new workflow run has been triggered.
Status for workflow Quarkus CI
This is the status report for running Quarkus CI on commit 11ede0d504ba36578a4f53cd2f2b99492f9b46a4.
:white_check_mark: The latest workflow run for the pull request has completed successfully.
It should be safe to merge provided you have a look at the other checks in the summary.
You can consult the Develocity build scans.
Also, we should pull up the HTTP parts.
Yes this was more of an idea, and we need to check the dashboard for startup time those eventually fill in if you leave the app running.
@brunobat ok i updated to move HTTP Endpoints up. Here is what mine looks like not sure why yours shows N/A at first nuless you are not wating 30 seconds for the scrape?
@melloware strange... I waited a few minutes. What app are you using to feed data to the dashboard? I'd like to take a look at the dependencies.
Hmm i was just using a REST app with...
<dependency>
<groupId>io.quarkus</groupId>
<artifactId>quarkus-rest-jackson</artifactId>
</dependency>
<dependency>
<groupId>io.quarkus</groupId>
<artifactId>quarkus-observability-devservices-lgtm</artifactId>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>io.quarkus</groupId>
<artifactId>quarkus-opentelemetry</artifactId>
</dependency>
<dependency>
<groupId>io.opentelemetry.instrumentation</groupId>
<artifactId>opentelemetry-jdbc</artifactId>
</dependency>
<dependency>
<groupId>io.quarkus</groupId>
<artifactId>quarkus-micrometer-registry-prometheus</artifactId>
</dependency>
I understand what's going on now... You have configured the prometheus scraper in the PR and I'm using the Micrometer OTLP registry that pushes things. I think it makes sense to have both setups, however the visualizations will be different.
I wonder if the scraper could be activated by a property. I'm not sure if it should be on or off by default. Ideally, all the output, in the future, should be OTLP (OpenTelemetry).
its just checking the Micrometer metric "expr": "process_start_time_seconds * 1000" which is exposed in the Prometheus metrics.
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "process_start_time_seconds * 1000",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "",
"metric": "",
"refId": "A",
"step": 14400
}
:waning_crescent_moon: This workflow status is outdated as a new workflow run has been triggered.
Status for workflow Quarkus CI
This is the status report for running Quarkus CI on commit fcd314f5bba941efd580025629a23667a3e2bd5a.
:white_check_mark: The latest workflow run for the pull request has completed successfully.
It should be safe to merge provided you have a look at the other checks in the summary.
You can consult the Develocity build scans.
One solution can be the least common denominator and only show metrics available with both registries on the dashboard. What do you think @melloware ?
We need this!
There are 3 types of problems related with missing metrics, metric names and metrics that are more relevant than others. The differences are so big that it's probably better to add 2 dashboards, one for Prometheus and another for OTLP.
I added a commit with an OTLP dashboard. We need to solve some issues detailed bellow.
Missing metrics on OTLP:
- process.memory
- jvm_gc_pause_milliseconds_count
Metrics with different names but fixed on the new dashboard:
| Component | Prometheus metric | OTel metric |
|---|---|---|
| Uptime | process_uptime_seconds{} | process_uptime_milliseconds{} /1000 |
| Total number of requests | http_server_requests_seconds_count | http_server_requests_milliseconds_count |
| Average inbound request duration | rate(http_server_requests_seconds_sum{ }[4m]) / rate(http_server_requests_seconds_count{ }[4m]) | rate(http_server_requests_milliseconds_sum{ }[4m]) / rate(http_server_requests_milliseconds_count{ }[4m]) / 1000 |
| Maximum inbound request duration | http_server_requests_seconds_max | max(http_server_requests_milliseconds_bucket) / 1000 |
| Sum of the duration of every request | rate(http_server_requests_seconds_sum{ }[2m]) | rate(http_server_requests_milliseconds_sum{ }[2m]) / 1000 |
| JVM Process Memory | process_memory_vss_bytes{ } | NA |
| Thread States | jvm_threads_states_threads{ } | jvm_threads_states{ } |
| Threads | composed query with jvm_threads_*_threads and process_threads | metrics follow jvm_threads_* and there' s no process.threads |
| ... | ... | ... |
Long story short:
| Component | Prometheus metric | OTel metric |
|---|---|---|
| Temporal metrics | seconds | miliseconds |
| Threads | jvm_threads_*_threads | jvm_threads_* |
| Classes | jvm_classes_*_classes | jvm_classes_* |
| cell | cell | cell |
Please note that the queries for OTLP need double-checking.
Component relevance:
- Having the max request is not very useful and I left them blanc... I would rather try to display percentils: 90%, 99% and 99.9%
- The I/O overview is just a repetition on the HTTP panels.
- We should probably display a total error count or rate with 4** and 5** requests
@brunobat 👍🏻 👍🏻 ⭐
Created an issue on the Grafana side to make the dashboards configurable... I had to do a hack to make them available.
@gastaldi @alesj can one of you please do an independent review? Thanks!
Folks, can we get this one in sooner rather than later please? The first iteration doesn't need to be picture perfect, we can always improve given how often Quarkus releases.
If anyone could point me to some documentation on how to enable this, it would help :)
https://quarkus.io/guides/observability-devservices-lgtm ?
I followed the guide above and had trouble finding the dashboards on the main page. After some digging, I navigated to http://localhost:43915/dashboards and found them there. Can we star them so they show up on the main page? I think that would be a much better experience.
On a separate note, I noticed that quarkus create app --extension=rest-jackson,quarkus-observability-devservices-lgtm fails with
[ERROR] ❗ Cannot find a dependency matching 'quarkus-observability-devservices-lgtm', maybe a typo?
[ERROR] ❗ Unable to create project: Failed to create project because of invalid extensions
Thanks for testing @gastaldi
Created issue with follow up work: https://github.com/quarkusio/quarkus/issues/43599
Nice work all!
Would you share a blog post or video where you show the whole thing going alive ?