goss
goss copied to clipboard
Add Prometheus HTTP Endpoint
It would be nice for goss validations to be consumed by Prometheus.io . I will work on this a little.
I know there was a previous efforts to implement this with pr https://github.com/aelsabbahy/goss/pull/175
Might be helpful
@aelsabbahy,
Currently I have something basic working with the prometheus client.
https://github.com/Smithx10/goss/commit/b336a7e146843393f4a5266495cf62525f2ae131
I'd like to have the metric name be based on the resource type, but couldn't figure out how to iterate over the NewGaugeVec Constructor Name attribute, without causing prometheus to panic because of duplicates.
gossGauge = prometheus.NewGaugeVec(prometheus.GaugeOpts{
Name: "goss",
Help: "Lets you know if goss assertions were true 0, or false 1"},
[]string{"resource_type", "resource_id", "property", "title"},
)
}
Do you think these Gauges should be specific to each resource types?
Here is the output:
# HELP goss Lets you know if goss assertions were true 0, or false 1
# TYPE goss gauge
goss{property="exists",resource_id="bruce.smith",resource_type="User",title=""} 1
goss{property="exists",resource_id="jim",resource_type="User",title=""} 1
goss{property="exists",resource_id="smith",resource_type="User",title=""} 1
goss{property="ip",resource_id="tcp:8080",resource_type="Port",title=""} 0
goss{property="listening",resource_id="tcp:8080",resource_type="Port",title=""} 1
Hey @Smithx10 👋 I once had a go (the original #175 attempt) wasn't really sure of what the expected output or use should be at the time. I'd be glad to collaborate / workout how goss and prometheus could work happily together?
I have a question around the actual value / metric produced, should it be binary 1 / 0 or would the time it took to complete the check be useful for helping identify slow DNS or HTTP reponses? Or any unusual slowness of any resource like the filesystem?
I had thought that a simple true or false value would be better suited to a metric label? I don't use Prometheus so maybe you have some experience or opinions you can share in terms of it's usage and what good looks like in the Prometheus world?
I haven't used pormetheus myself so I can't really comment on this.
If you guys come to an agreement, it makes sense to add this to goss. If it ends up having many different opinions I'm wondering if it makes sense as a sample script in extra/ that parses the Goss json output and reformats it?
What do you guys think?
Thanks @aelsabbahy I made use of this quick docker-compose prometheus stack: https://github.com/vegasbrianc/prometheus and came up with #363 which is similar to the above idea / output but formats results to Prometheus text format output rather than using the client library.
I'm thinking of creating a prometheus collector instead in order to get similar metric names like @pysysops has. Hopefully i'll get some time and figure it out.
@pysysops ,
I'm not really using goss as a tool for health checking applications, because I really don't believe that's the ethos of the project. All I, as a Prometheus / Grafana user am interested in is finding out when my configuration drifted, for how long it drifted, and if it came back from the drift. This is valuable if multiple actors are acting upon an infrastructure. Ex. Someone logged in and changed a configuration, and event handler etc.
Does that make sense?
Isn't one possible way of getting prometheus and goss to work together to use goss serve and the blackbox_exporter (https://github.com/prometheus/blackbox_exporter) for checking the http status code?
Seems work on this started and stopped multiple times with no agreement on solution.
Anyone here know if: A. This is possible or is everyone's needs unique? B. If it makes sense to be a goss output format?
@aelsabbahy Hi I am using goss and need this feature.Please add prometheus exporter to goss.
@karimiehsan90 This will have to be submitted by a contributor since I personally never used Prometheus. As long as a PR is submitted that is agrreed upon.
The one limitation I will say is that goss can have a Prometheus output format, but it should not be pushing results over the network.
Hi
Please excuse my poor english.
I also implemented the goss with prometheus output. https://github.com/harre-orz/goss/commit/608323699f39d0c4823e7dff6d932e74fc8e758b
The output format is as follows.
- Collect information of success = 0, failure = 1, skiped = 2 with
goss_result(like https://github.com/Smithx10/goss/commit/b336a7e146843393f4a5266495cf62525f2ae131)
# HELP goss_result Lets you know if goss assertions were true 0, or false 1, or skip 2
# TYPE goss_result gauge
goss_result{property="enabled",resource_id="sshd",resource_type="Service",title=""} 1
goss_result{property="running",resource_id="sshd",resource_type="Process",title=""} 0
goss_result{property="running",resource_id="sshd",resource_type="Service",title=""} 2
- Collect execution time with
goss_duration
# HELP goss_duration Lets you know duration of goss execution
# TYPE goss_duration gauge
goss_duration{property="enabled",resource_id="sshd",resource_type="Service",title=""} 0.007284487
goss_duration{property="running",resource_id="sshd",resource_type="Process",title=""} 1.125e-05
goss_duration{property="running",resource_id="sshd",resource_type="Service",title=""} 0
I think it is necessary to collect the goss_result and goss_duration metrics separately, and labels the goss_result and goss_duration equally.
Considering different formats
If append result label the results as shown below, the dimensions will be different and I will not be able to efficiently a PromQL.
# HELP goss_result bad example 1
# TYPE goss_result gauge
goss_result{property="enabled",resource_id="sshd",resource_type="Service",title="", result="success"} 1
goss_result{property="running",resource_id="sshd",resource_type="Process",title="", result="failure"} 1
goss_result{property="running",resource_id="sshd",resource_type="Service",title="", result="skipped"} 1
If match the dimensions, I will need 3 times more metrics as shown below.
# HELP goss_result bad example 2
# TYPE goss_result gauge
goss_result{property="enabled",resource_id="sshd",resource_type="Service",title="", result="success"} 1
goss_result{property="enabled",resource_id="sshd",resource_type="Service",title="", result="failure"} 0
goss_result{property="enabled",resource_id="sshd",resource_type="Service",title="", result="skipped"} 0
goss_result{property="running",resource_id="sshd",resource_type="Process",title="", result="success"} 0
goss_result{property="running",resource_id="sshd",resource_type="Process",title="", result="failure"} 1
goss_result{property="running",resource_id="sshd",resource_type="Process",title="", result="skipped"} 0
goss_result{property="running",resource_id="sshd",resource_type="Service",title="", result="success"} 0
goss_result{property="running",resource_id="sshd",resource_type="Service",title="", result="failure"} 0
goss_result{property="running",resource_id="sshd",resource_type="Service",title="", result="skipped"} 1
Therefore, I represent the goss_result metric for numerically of succes = 0, failure = 1, skipped = 2.
@harre-orz Looks good. I would have 2 suggestions:
- Add the unit to the
goss_durationmetric (i.e.goss_duration_seconds) according to the best practices of Prometheus metric naming - Maybe add 4 more metrics for: (
goss_tests_total,goss_tests_failed_totalandgoss_tests_skipped_total,goss_test_duration_seconds) so it might be easier to get those numbers although potentially some could be retrieved with promql
Hi @timeu, Thank you for your good suggestions.
-
goss_duration_secondsis a good name. Changed to it. https://github.com/harre-orz/goss/commit/ab578eaa03616b196f5772f66662a596c0ad69ca -
I think for summary metrics are not needed, because calculate by PromQL
# Ex1: get test total count by PromQL
count(goss_result{}) or on() vector(0)
# Ex2: get failed test count by PromQL
count(goss_result{} == 1) or on() vector(0)
But, this PromQL is difficult. Do you think best for create summary metrics?
If you want to include summary metrics, suggestion for output format bellow:
# HELP goss_tests_count Test count of goss assertions
# TYPE goss_tests_count gauge
goss_tests_count 3
# HELP goss_tests_failed_count Test failed count of goss assertions
# TYPE goss_tests_failed_count gauge
goss_tests_failed_count 1
# HELP goss_tests_skipped_count Test skipped count of goss assertions
# TYPE goss_tests_skipped_count gauge
goss_tests_skipped_count 1
https://github.com/harre-orz/goss/commit/721e1ab0ff842033bddd464f1dbe5e999f54c83e
@harre-orz : You might be right regarding the summary statistics. I am just wondering if the other output formats (json, etc) output those summary statistics as well and if it makes sense to have the prometheus output aligned like that ? I am not sure what the best practice is regarding prometheus and those summary statistics.
If you output the summary statistics then I would recommend to use: goss_tests_skipped_total instead of goss_tests_skipped_count (also for the other ones).
According to https://prometheus.io/docs/practices/naming/ and https://prometheus.io/docs/instrumenting/writing_exporters/ _count is for summaries and _total is for a regular counters
Edit: If you output the summary statistics, I would also output goss_tests_duration_seconds
Edit2: Thinking a bit more about _total vs _count suffix, I think in case of total test results _count could be also correct.
Thank you @timeu, I think it's better to include the execution time of goss.
The execution time of goss (goss_tests_duration_seconds) is not equal to the sum of goss_duration_seconds. Other formats (Ex: json) have similar specifications.
The goss_tests_duration_seconds metric is as follows:
# HELP goss_tests_duration_seconds Execution time of goss assertions
# TYPE goss_tests_duration_seconds gauge
goss_tests_duration_seconds 0.013728257
https://github.com/harre-orz/goss/commit/9c947dadab1a52af441980217139c042d963f743
I think _total is not a proper suffix for accumulating count.
I read https://prometheus.io/docs/instrumenting/writing_exporters/, but maybe _sum instead of _count.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
For those that commented on this issue, please see #607 PR. I would be interested in a review from the community and don't want to merge something in that's not agreed upon.
Last call for feedback on #607 from @petemounce
I will most likely merge it in a week or so it I no one has objections on current implementation.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
up. not stale.
Marked as approved so stale bot leaves it alone.
PR/implementation still being worked on, but at this point metrics endpoint (prometheus) is approved.
If somebody needs this, I created a sidecar container, who does the exporting at https://github.com/DracoBlue/goss-metrics-exporter
Hello all, saw a new attempt at this here: #771
I would love some feedback/reviews from those interested here if this is the preferred approach for the community over #607
#607 has been merged, marking this as closed since it will be in the next release.
Thank you all for your time, opinions, and contributions!