solr icon indicating copy to clipboard operation
solr copied to clipboard

SOLR-10654: Introduce output of Prometheus metrics directly from Solr

Open mlbiscoc opened this issue 1 year ago • 17 comments

https://issues.apache.org/jira/browse/SOLR-10654

Description

Introduce a new Prometheus response writer /solr/admin/metrics?wt=prometheus to output Dropwizards Solr Core registry metrics directly from Solr. Output looks similar to the prometheus exporter. This exports solr.core.* solr.node solr.jvm solr.jetty metric registries

Solution

Currently, Solr metrics are collected by Dropwizard and differentiated by metric names. This is an anti-pattern to Prometheus. Manually export Dropwizard Solr core registry metrics to a new SolrPrometheusExporter base class to parse out the labels from the metric name and output in a Prometheus readable format for the Prometheus Response writer. The exporter is just temporarily created to export a snapshot of what the current Dropwizard registry is at. These are not held in memory.

The exporter will categorize these metrics and export to prometheus format with labels depending on the category and metric names.

In standalone, all labels for the solr.core registry includescore. In cloud mode, all metrics include core, collection, shard, replica

NOTE: Not all metrics are possible to convert to Prometheus such as Dropwizard string metrics. This is because Prometheus does not support non-numeric types

Tests

Added MetricsHandlerTest, SolrPrometheusExporterTest, TestPrometheusReponseWriter

Checklist

Please review the following and check all that apply:

  • [x] I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
  • [x] I have created a Jira issue and added the issue ID to my pull request title.
  • [x] I have given Solr maintainers access to contribute to my PR branch. (optional but recommended)
  • [x] I have developed this patch against the main branch.
  • [x] I have run ./gradlew check.
  • [x] I have added tests for my changes.
  • [x] I have added documentation for the Reference Guide

mlbiscoc avatar Apr 17 '24 15:04 mlbiscoc

Moving this out of draft for review

mlbiscoc avatar May 02 '24 20:05 mlbiscoc

There are enough additional dependencies that I feel strongly this should be another module.

WDYT?

dsmiley avatar May 07 '24 13:05 dsmiley

There are enough additional dependencies that I feel strongly this should be another module.

WDYT?

Sorry, I somehow completely missed this comment. When you say be another module, do you mean move it to solr/modules that needs to be enabled? If so, I think that prometheus metrics is valuable enough to be part of core and enabled by default but I also only added 2 additional dependencies. Is that enough to be its own module?

mlbiscoc avatar May 07 '24 14:05 mlbiscoc

io.prometheus:prometheus-metrics-config:1.1.0 (1 constraints: 3e164ed8)
io.prometheus:prometheus-metrics-exposition-formats:1.1.0 (1 constraints: 0405f335)
io.prometheus:prometheus-metrics-model:1.1.0 (2 constraints: 411b133b)
io.prometheus:prometheus-metrics-shaded-protobuf:1.1.0 (1 constraints: 

No, there are 4 dependencies. Only the last one concerns me. Why is protobuf needed at all? Can it be excluded? Can any of the others be excluded as well, while you're at it?

dsmiley avatar May 07 '24 14:05 dsmiley

io.prometheus:prometheus-metrics-config:1.1.0 (1 constraints: 3e164ed8)
io.prometheus:prometheus-metrics-exposition-formats:1.1.0 (1 constraints: 0405f335)
io.prometheus:prometheus-metrics-model:1.1.0 (2 constraints: 411b133b)
io.prometheus:prometheus-metrics-shaded-protobuf:1.1.0 (1 constraints: 

No, there are 4 dependencies. Only the last one concerns me. Why is protobuf needed at all? Can it be excluded? Can any of the others be excluded as well, while you're at it?

Ah ok you were correct. I went ahead and removed the transitive dependencies prometheus-metrics-shaded-protobuf and prometheus-metrics-config so there should now only be 2

mlbiscoc avatar May 07 '24 17:05 mlbiscoc

@dsmiley or anyone else interested in reviewing. Was curious for more feedback on approach. Is this going in the correct direction or do some disagree if this is correct implementation path towards prometheus in Solr.

mlbiscoc avatar May 15 '24 20:05 mlbiscoc

Much better. I know I've been a bit picky/detailed but I'm trying to achieve a level of clarity & conciseness here that will end up being repeated for the other metrics.

What do you think the next step should be? Totally completing the core metrics so that you have parity with the default Prometheus Exporter config?

No worries, I'd prefer you be picky than someone in the future seeing this and it makes no sense to them. I would say next steps are to export the other 3 registries solr.jetty, solr.jvm, solr.node and yes getting as close to the prometheus exporters config as possible. Luckily, in my opinion this was the hardest registry (hopefully I was right) and the ground work from this would be in place to do the rest.

mlbiscoc avatar May 23 '24 21:05 mlbiscoc

What's here looks good to me. Just need to add more; lots more metrics. Maybe it'll evolve a little once you add another category. My interest on this at-work is not the core metrics but all the other ones (Solr, JVM, Jetty). I'd then be inclined to take this to our Solr fork to kick the tires for real world usage.

dsmiley avatar May 31 '24 05:05 dsmiley

What's here looks good to me. Just need to add more; lots more metrics. Maybe it'll evolve a little once you add another category. My interest on this at-work is not the core metrics but all the other ones (Solr, JVM, Jetty). I'd then be inclined to take this to our Solr fork to kick the tires for real world usage.

Thanks. I've been caught up in some other things recently but plan on continuing on the other registries somewhat soon and it shouldn't be as bad to export. Would it be better for me to continue and commit them to this branch, or was this planned to be merged and export the other 3 registries in a separate PR?

mlbiscoc avatar May 31 '24 13:05 mlbiscoc

It doesn't feel right to merge this PR right now. Needs docs. Also the choice of what metrics are done so far is inverted from the value that would make most sense for this whole approach -- OS/JVM/node issues are highest value.

dsmiley avatar May 31 '24 15:05 dsmiley

It doesn't feel right to merge this PR right now. Needs docs. Also the choice of what metrics are done so far is inverted from the value that would make most sense for this whole approach -- OS/JVM/node issues are highest value.

@dsmiley Just finished up exporting solr.node solr.jvm and solr.jetty metrics to prometheus and is ready for another round of review whenever you are. I updated the description above and attached files if you want to see what the prometheus output looks like. Last thing I did not finish writing up is docs. I'm assuming the 2 places to update would be the metrics-api section and possibly Monitoring with Prometheus and Grafana?

mlbiscoc avatar Jun 10 '24 18:06 mlbiscoc

Added ref-guide documentation on Prometheus endpoint

mlbiscoc avatar Jun 11 '24 18:06 mlbiscoc

I'd like to suggest an integration test showing all the output (say to a file) stripped of numbers (to remove brittle/changing stuff) to show that we export what we mean to export. I could imagine someone renaming some metric and then forgetting to update the code here.

Thanks for reviewing again. Added in an integration test with a file stripped of values. Also the unit tests I wrote originally should catch if any changes that happen to the naming of the prometheus metric or labels.

Also thanks for catching missing javadocs. Made sure to add them in for each class and added sample values for what is being exported to prometheus.

mlbiscoc avatar Jun 18 '24 15:06 mlbiscoc

I tried this PR locally using the "techproducts" example. I thing I noticed is that the metrics filtering appears to not work, which is a big problem. A test should be added for this, and it should work.

http://localhost:8983/solr/admin/metrics?category=solr.jetty&wt=prometheus

Filtering does work I think the parameter you are looking for is group. Try localhost:8983/solr/admin/metrics?group=jvm&wt=prometheus and you can change it with jetty node core for the corresponding registries

But still you are correct, I should have a test for this. Will add one in.

The only other thing I remember now about this is that regex, key, expr parameters for filtering will not work with this prometheus wt. I did that on purpose because those 3 parameters are for filtering based on the name of the metric while the others are kind of more generic such as registry or type filtering. I didn't think it made sense because prometheus naming is different from the dropwizard naming after exporting. For example: solr\.core\..*:QUERY\..*\.requestTimes:max_ms is not an output of prometheus but to allow expr filtering around that felt odd.

Even then if you need extensive filtering or trying to get some specific metric, maybe the Prometheus Exporter process with a custom config might be the better solution.

mlbiscoc avatar Jun 28 '24 20:06 mlbiscoc

Oops; maybe I was hurried. I'm glad the category works at least! The other filtering options could be deferred to a future JIRA by matching the prometheus name. It's a very useful capability; some metrics implemented as gauges are expensive to compute, and others are just excessive output that aren't needed for O(N) for N cores if N is for some users quite high.

dsmiley avatar Jun 28 '24 22:06 dsmiley

Oops; maybe I was hurried. I'm glad the category works at least! The other filtering options could be deferred to a future JIRA by matching the prometheus name. It's a very useful capability; some metrics implemented as gauges are expensive to compute, and others are just excessive output that aren't needed for O(N) for N cores if N is for some users quite high.

Added in a test for the filtering. I'd be happy expand on this and add further filtering feature specific to prometheus after this is ready to be merged.

mlbiscoc avatar Jul 01 '24 20:07 mlbiscoc

This is super close now. Can you propose a brief CHANGES.txt under 9.7 "New Features" (I think it's closer to this than an improvement).

Added entry into CHANGES.txt

mlbiscoc avatar Jul 02 '24 01:07 mlbiscoc