grok_exporter
grok_exporter copied to clipboard
Time between loglines
As suggested in #46 the time between two loglines is quite often a useful statistic.
In one of the applications running on our cluster, we get loglines like the following :
[2018-11-26 11:12:26 +0000] INFO com.somejavaclass.redacted.processingnode.verticle.ProcessingJobVerticle Received processing job with jobId: 9bfb55a0-ae31-42c6-a2df-00160e65986c
[2018-11-26 11:12:26 +0000] INFO com.somejavaclass.redacted.processingnode.verticle.ProcessingJobVerticle Created path for source file: /data/input/0b487e4a-4d65-49b8-a321-4b8a1f83d6b8
[2018-11-26 11:12:26 +0000] FINE com.somejavaclass.redacted.http.DownloadVerticle Received download message : {"com.somejavaclass.redacted.http.DownloadVerticle.save.path.key":"/data/input/0b487e4a-4d65-49b8-a321-4b8a1f83d6b8","com.somejavaclass.redacted.http.DownloadVerticle.download.url.key":"https://redacted.host.name/source/2F7dMYtNsRkMJGexWfLz-y/Eexz2sMfTiE89Gr536fSMc"}
[2018-11-26 11:12:26 +0000] FINE com.somejavaclass.redacted.http.ZookeeperVerticle Received resolve url message.
[2018-11-26 11:12:26 +0000] FINE com.somejavaclass.redacted.http.ZookeeperVerticle Url 'https://redacted.host.name/source/2F7dMYtNsRkMJGexWfLz-y/Eexz2sMfTiE89Gr536fSMc' is not an internal url, using it unmodified.
[2018-11-26 11:12:26 +0000] FINE com.somejavaclass.redacted.http.DownloadVerticle Found https url, configuring all trusting ssl client options.
[2018-11-26 11:12:26 +0000] INFO com.somejavaclass.redacted.http.DownloadVerticle Downloading file from 'https://redacted.host.name:-1/source/2F7dMYtNsRkMJGexWfLz-y/Eexz2sMfTiE89Gr536fSMc' and saving to: /data/input/0b487e4a-4d65-49b8-a321-4b8a1f83d6b8
[2018-11-26 11:12:26 +0000] WARNING io.netty.util.internal.logging.Slf4JLogger warn Failed to find a usable hardware address from the network interfaces; using random bytes: d2:41:fa:d0:d3:8c:b6:b9
[2018-11-26 11:12:26 +0000] INFO com.somejavaclass.redacted.http.DownloadVerticle Beginning download...
[2018-11-26 11:12:29 +0000] FINE com.somejavaclass.redacted.http.DownloadVerticle Closed file /data/input/0b487e4a-4d65-49b8-a321-4b8a1f83d6b8.
[2018-11-26 11:12:29 +0000] INFO com.somejavaclass.redacted.processingnode.verticle.ProcessingJobVerticle Download finished from: https://redacted.host.name/source/2F7dMYtNsRkMJGexWfLz-y/Eexz2sMfTiE89Gr536fSMc. File saved at : /data/input/0b487e4a-4d65-49b8-a321-4b8a1f83d6b8
The time between Download started (11:12:26) and Download finished (11:12:29) for a job is actually really interesting to us.
For multiple events, my suggestion for the flow (least confusion and easiest to code) :
11:00 [event 2 fires] (without event 1), do nothing because we have nothing to time against
11:01 [event 1 fires] log time.
11:02 [event 2 fires] check for a tracked event 1 and store the time between now and then and output the metric
11:03 [event 1 fires] log time
11:04 [event 1 fires] log time, overwrite slot
11:05 [event 2 fires] check for a tracked event 1 and store the time between now and then and output the metric (1 minute)
11:06 [event 2 fires] check for a tracked event 1 and store the time between now and then and output the metric (2 minutes because 11:06-11:04)
Whilst this doesn't give the ability to correlate between events, it's the least surprising and easiest to code. If there is some kind of correlation ID and events might nest or fire asynchronously, a regex for correlation id might be a nice to have, but that requires more storage, and lots more metric buckets.
@fstab thoughts?
This sounds like a bug. It does not seem related to missing settings. The only thing that starts a thread is the built-in server https://github.com/fstab/promagent/blob/master/promagent-framework/promagent-internal/src/main/java/io/promagent/internal/BuiltInServer.java. If you start promagent without the port parameter this server will not be started. You could try that to check if this is the process preventing shutdown.
Indeed, after starting without the port, the process shuts down gracefully. As a solution: maybe you can register a VM-shutdownhook that provides a httpServer.stop(...) to gracefully shutdown the httpServer instance? P.S. not sure how this affects non-standalone-Tomcat usages of the promagent though (I did not look into it (yet) though).
It would be interesting to see if a shutdown hook works. If you try this and it works, please open a pull request.
Alternatively, there are other ways to expose the metrics (see https://github.com/fstab/promagent#exposing-metrics):
- You can deploy
promagent.warand it will expose the metrics. - Metrics are exposed via JMX, so you could for example use github.com/prometheus/jmx_exporter to expose the metrics.
A yeah, ok, that's right. I blindly went with the javaagent approach some time ago and forgot all about those alternatives. Since i already use the JMX exporter for some other cases - on the same Tomcat app and thus already present - i'll first go with that one and try that path.
The promagent.war approach like promising also, it fits the Catalina approach, and i'll definitely try that one also.
About the shutdown hook; if i find the time i'll definitely try that one too. And if it's valuable, i'll open a pull request.
Thanks for the help (and i'll confirm if those alternatives work for me) !