grok_exporter icon indicating copy to clipboard operation
grok_exporter copied to clipboard

Time between loglines

Open iMartyn opened this issue 6 years ago • 1 comments
trafficstars

As suggested in #46 the time between two loglines is quite often a useful statistic.

In one of the applications running on our cluster, we get loglines like the following :

[2018-11-26 11:12:26 +0000] INFO   com.somejavaclass.redacted.processingnode.verticle.ProcessingJobVerticle Received processing job with jobId: 9bfb55a0-ae31-42c6-a2df-00160e65986c
[2018-11-26 11:12:26 +0000] INFO   com.somejavaclass.redacted.processingnode.verticle.ProcessingJobVerticle Created path for source file: /data/input/0b487e4a-4d65-49b8-a321-4b8a1f83d6b8
[2018-11-26 11:12:26 +0000] FINE   com.somejavaclass.redacted.http.DownloadVerticle Received download message : {"com.somejavaclass.redacted.http.DownloadVerticle.save.path.key":"/data/input/0b487e4a-4d65-49b8-a321-4b8a1f83d6b8","com.somejavaclass.redacted.http.DownloadVerticle.download.url.key":"https://redacted.host.name/source/2F7dMYtNsRkMJGexWfLz-y/Eexz2sMfTiE89Gr536fSMc"}
[2018-11-26 11:12:26 +0000] FINE   com.somejavaclass.redacted.http.ZookeeperVerticle Received resolve url message.
[2018-11-26 11:12:26 +0000] FINE   com.somejavaclass.redacted.http.ZookeeperVerticle Url 'https://redacted.host.name/source/2F7dMYtNsRkMJGexWfLz-y/Eexz2sMfTiE89Gr536fSMc' is not an internal url, using it unmodified.
[2018-11-26 11:12:26 +0000] FINE   com.somejavaclass.redacted.http.DownloadVerticle Found https url, configuring all trusting ssl client options.
[2018-11-26 11:12:26 +0000] INFO   com.somejavaclass.redacted.http.DownloadVerticle Downloading file from 'https://redacted.host.name:-1/source/2F7dMYtNsRkMJGexWfLz-y/Eexz2sMfTiE89Gr536fSMc' and saving to: /data/input/0b487e4a-4d65-49b8-a321-4b8a1f83d6b8
[2018-11-26 11:12:26 +0000] WARNING io.netty.util.internal.logging.Slf4JLogger warn Failed to find a usable hardware address from the network interfaces; using random bytes: d2:41:fa:d0:d3:8c:b6:b9
[2018-11-26 11:12:26 +0000] INFO   com.somejavaclass.redacted.http.DownloadVerticle Beginning download...
[2018-11-26 11:12:29 +0000] FINE   com.somejavaclass.redacted.http.DownloadVerticle Closed file /data/input/0b487e4a-4d65-49b8-a321-4b8a1f83d6b8.
[2018-11-26 11:12:29 +0000] INFO   com.somejavaclass.redacted.processingnode.verticle.ProcessingJobVerticle Download finished from: https://redacted.host.name/source/2F7dMYtNsRkMJGexWfLz-y/Eexz2sMfTiE89Gr536fSMc. File saved at : /data/input/0b487e4a-4d65-49b8-a321-4b8a1f83d6b8

The time between Download started (11:12:26) and Download finished (11:12:29) for a job is actually really interesting to us.

For multiple events, my suggestion for the flow (least confusion and easiest to code) :

11:00 [event 2 fires] (without event 1), do nothing because we have nothing to time against
11:01 [event 1 fires] log time.
11:02 [event 2 fires] check for a tracked event 1 and store the time between now and then and output the metric
11:03 [event 1 fires] log time
11:04 [event 1 fires] log time, overwrite slot
11:05 [event 2 fires] check for a tracked event 1 and store the time between now and then and output the metric (1 minute)
11:06 [event 2 fires] check for a tracked event 1 and store the time between now and then and output the metric (2 minutes because 11:06-11:04)

Whilst this doesn't give the ability to correlate between events, it's the least surprising and easiest to code. If there is some kind of correlation ID and events might nest or fire asynchronously, a regex for correlation id might be a nice to have, but that requires more storage, and lots more metric buckets.

@fstab thoughts?

iMartyn avatar Nov 26 '18 12:11 iMartyn

This sounds like a bug. It does not seem related to missing settings. The only thing that starts a thread is the built-in server https://github.com/fstab/promagent/blob/master/promagent-framework/promagent-internal/src/main/java/io/promagent/internal/BuiltInServer.java. If you start promagent without the port parameter this server will not be started. You could try that to check if this is the process preventing shutdown.

fstab avatar Apr 02 '19 20:04 fstab

Indeed, after starting without the port, the process shuts down gracefully. As a solution: maybe you can register a VM-shutdownhook that provides a httpServer.stop(...) to gracefully shutdown the httpServer instance? P.S. not sure how this affects non-standalone-Tomcat usages of the promagent though (I did not look into it (yet) though).

WimLambrecht avatar Apr 03 '19 07:04 WimLambrecht

It would be interesting to see if a shutdown hook works. If you try this and it works, please open a pull request.

Alternatively, there are other ways to expose the metrics (see https://github.com/fstab/promagent#exposing-metrics):

  • You can deploy promagent.war and it will expose the metrics.
  • Metrics are exposed via JMX, so you could for example use github.com/prometheus/jmx_exporter to expose the metrics.

fstab avatar Apr 03 '19 20:04 fstab

A yeah, ok, that's right. I blindly went with the javaagent approach some time ago and forgot all about those alternatives. Since i already use the JMX exporter for some other cases - on the same Tomcat app and thus already present - i'll first go with that one and try that path.

The promagent.war approach like promising also, it fits the Catalina approach, and i'll definitely try that one also.

About the shutdown hook; if i find the time i'll definitely try that one too. And if it's valuable, i'll open a pull request.

Thanks for the help (and i'll confirm if those alternatives work for me) !

WimLambrecht avatar Apr 04 '19 06:04 WimLambrecht