vegeta icon indicating copy to clipboard operation
vegeta copied to clipboard

Implementing Prometheus Exporter and documentation

Open flaviostutz opened this issue 3 years ago • 10 comments

Background

This is a complete initial implementation of Prometheus Exporter support as discussed in #477

Checklist

  • [X] Git commit messages conform to community standards.
  • [X] Each Git commit represents meaningful milestones or atomic units of work.
  • [X] Changed or added code is covered by appropriate tests.

flaviostutz avatar Jul 24 '20 20:07 flaviostutz

This is a continuation of PR https://github.com/tsenart/vegeta/pull/526 but decided to create a new PR to cleanup things for a new review.

flaviostutz avatar Jul 24 '20 20:07 flaviostutz

@tsenart Could you please give a check on this PR?

flaviostutz avatar Aug 05 '20 23:08 flaviostutz

@flaviostutz: I've been swamped with work but saved some time to look at this on Saturday! Sorry.

tsenart avatar Aug 06 '20 09:08 tsenart

No worries... I know how it is... here with me it's the same... thanks!

flaviostutz avatar Aug 06 '20 12:08 flaviostutz

@tsenart could you please take a look at this PR when you have some time? I think I managed to do all the requested changes.

flaviostutz avatar Aug 28 '20 23:08 flaviostutz

@tsenart please take a look at the requested changes. Sorry for the late reply...

flaviostutz avatar Oct 09 '20 21:10 flaviostutz

Hi guys,

Hope you are all well !

Just wanted to know the current status of this PR as it sounds awesome for load testing any webapp.

What is missing to merge it ?

Cheers, Luc

ghost avatar Dec 03 '20 04:12 ghost

As someone who uses Vegeta as a library for testing/benchmarking/loadtesting, this is fantastic! Having to manually implement OpenCensus middleware for HTTP is a bit of a pain.

Just a note, is there a way to customise the latency histogram buckets? It looked like they were hardcoded to me, and since requestSecondsHistogram is not exposed I can't change it.

I can think of two immediate ways of making it customisable. Exposing the prom metrics types to the user, or enabling the user to set it as a "param" upon creating the PrometheusMetrics using NewPrometheusMetricsWithParams.

leosunmo avatar Dec 16 '20 00:12 leosunmo

hola people!

any news on this one? seems this functionality is widely needed and a lot of thanks for your nice job cc @flaviostutz @tsenart

dntosas avatar Jan 23 '21 06:01 dntosas

@hfuss @flaviostutz I'm wondering what the state of the PR is and if it's still possible to implement Prometheus support.

I agree that normally an ad-hoc job like a Vegeta process could benefit from just delivering results to Pushgateway. I also think creating an exporter could be useful if you're going to run Vegeta for a long time and want a simpler integration.

I'm preparing to use Vegeta for some long-running (hours-days) load-testing jobs and wondered if I could help at all with Prometheus integration one way or the other.

  • Do we think it's still possible to create the exporter?
  • Should we instead (or also) make it easy to transmit to Pushgateway via vegeta report or something?

spacez320 avatar May 23 '21 23:05 spacez320

Depending on use case, I would advise against the PushGateway approach except as a last resort. In my efforts to use it, observed that metrics over the push gateway have no TTL and persist forever, and the metric's value only updates if you send new values over time with the exact same labels/dimensions. If the labels change over time (different instance, host, etc.), you'll just get new metrics instead and both old and new metrics persist in the system over time.

So when say viewing the metrics in grafana, you have a never ending set of timeseries, rather than timeseries with filled lines for whenever metrics are sent to gateway, and missing data on chart for intervals where no data is sent to push gateway. I wanted and expected the latter but got the former. For ad-hoc and periodic load testing, I'd assume people would want the latter. The former is more suitable if say you ran vegeta for monitoring of a test instance that never changes, same service you are monitoring over time.

For what I want to do, if you go with push gateway, you may need something like this forked version instead that offers TTL to the push gateway metrics: https://github.com/dinumathai/pushgateway

So I think the exporter route is good, I think it doesn't have the TTL issue that the push gateway has.

Another option to consider for alternative to exporter and push gateway, is the Prometheus remote write protocol/feature, as a way to push metrics to prometheus. We're looking into and testing the remote write out at our organization, I'm not aware of the specific implementation or how well it's performing at the moment. But I wouldn't know for case of vegeta, which would be more optimal from a performance standpoint since vegeta needs to both generate the load (test) and also expose or push the metrics. https://last9.io/blog/what-is-prometheus-remote-write/

daluu avatar Jul 11 '23 01:07 daluu

@daluu: Makes sense. I think the Prometheus exporter should then be implemented as a vegeta sub command, like vegeta prom-export, which you'd use with vegeta attack ... | tee results.bin | vegeta prom-export

tsenart avatar Jul 11 '23 06:07 tsenart

As discussed in #477 we decided to instrument attack because then we don't need to parse stdin and it would be simpler/more performant.

If we decide to change this at this point of the PR actually we should cancel it and start a new branch because a lot of things will be different.

I would recommend us to finish this PR with the initial requirements because it's almost there and if necessary in the future we discuss better about creating a reporter/push gateway/direct write version of it.

What do you think?

flaviostutz avatar Jul 18 '23 23:07 flaviostutz

Ah, I had forgotten that discussion. It was a long time ago!

Thinking about it now, having the attack command expose a web server handler that a Prometheus instance can scrape is good for performance and reducing moving pieces in distributed attacks.

But for interactive use and debugging past attacks I think we should still have a sub command that exporta saved results as Prometheus metrics.

So, the answer is I think we need both, and they should share as much code as possible. The only difference between doing it in attack and doing it in another sub-command is that in attack we can observe the metric without decoding the result first.

I suggest we introduce a lib/prom package where all of this is encapsulated and modularized.

Also, very much up for hacking on this together. So let me know how you'd want to go about it.

tsenart avatar Jul 19 '23 07:07 tsenart

Ah, I had forgotten that discussion. It was a long time ago!

Thinking about it now, having the attack command expose a web server handler that a Prometheus instance can scrape is good for performance and reducing moving pieces in distributed attacks.

But for interactive use and debugging past attacks I think we should still have a sub command that exporta saved results as Prometheus metrics.

So, the answer is I think we need both, and they should share as much code as possible. The only difference between doing it in attack and doing it in another sub-command is that in attack we can observe the metric without decoding the result first.

I suggest we introduce a lib/prom package where all of this is encapsulated and modularized.

Also, very much up for hacking on this together. So let me know how you'd want to go about it.

This PR is already creating a lib/prom package. I just reviewed and fixed all comments that were open (after 3y lol!). Please do another review round and mark comments as resolved if they are ok.

flaviostutz avatar Jul 22 '23 15:07 flaviostutz

I squashed the messy commits (various upstream merges and small changes) from my branch into one and applied the PGP signatures so you can merge it to master 😁

flaviostutz avatar Jul 22 '23 16:07 flaviostutz

@tsenart Pushed everything I changed here. In summary I tried to resolve all comments, so please do another round of checks.

As this PR is too big already, I would advise us to merge it as soon as possible and build the "prom" command in another one.

flaviostutz avatar Jul 23 '23 21:07 flaviostutz

Thank you for your great effort on this <3 Landed in 81403a61f75e655a8bfe2991a58a58defe5a2142. Opened https://github.com/tsenart/vegeta/issues/637 which I'll work on next!

tsenart avatar Jul 24 '23 17:07 tsenart

@daluu https://github.com/tsenart/vegeta/pull/534#issuecomment-1629943731 to this point: have no TTL and persist forever, and the metric's value only updates if you send new values over time with the exact same labels/dimensions

As Info I use a PushGateway and delete at the end all metrics based by vegta so i see no problem with pushgateway:

import (
	"log"
	"time"

	"github.com/prometheus/client_golang/prometheus"
	"github.com/prometheus/client_golang/prometheus/push"
	vegeta "github.com/tsenart/vegeta/v12/lib"
	"github.com/tsenart/vegeta/v12/lib/prom"
)


type PushGatewayRegister struct {
	Pusher *push.Pusher
}

// MustRegister implements prometheus.Registerer.
func (p *PushGatewayRegister) MustRegister(c ...prometheus.Collector) {
	for _, v := range c {
		p.Pusher = p.Pusher.Collector(v)
	}
}

// Register implements prometheus.Registerer.
func (p *PushGatewayRegister) Register(c prometheus.Collector) error {
	p.Pusher = p.Pusher.Collector(c)
	return nil
}

// Unregister implements prometheus.Registerer.
func (p *PushGatewayRegister) Unregister(c prometheus.Collector) bool {
	return true
}


func LoadTest() {
      metrics := prom.NewMetrics()

	pusher := push.New("url", "job")
	_ := metrics.Register(&PushGatewayRegister{Pusher: pusher})
       for res := range attacker.Attack(targeter.Targeter, rate, duration, "Big Bang!") {
		metrics.Observe(res)
		err = pusher.Push()
		if err != nil {
			log.Println(fmt.Errorf("push error %w", err))
		}
	}
	_ = pusher.Delete() <- thats the point... all pusher created metrics will be removed from pushgateway
}



but change my mind

fasibio avatar Feb 28 '24 17:02 fasibio

@fasibio

As Info I use a PushGateway and delete at the end all metrics based by vegta so i see no problem with pushgateway:

As long as that works out in the end. Then yes should be fine. I would advise to please do test that this works as anticipated, by sending some metrics, then stop sending anymore for some time e.g. over 15 minutes to an hour or more, and verifying that grafana (or equivalent data viewer) properly renders the metric as discrete data points for when metrics were sent rather than an extrapolated line that continues through current time even though no more metrics were sent.

Expecting how something works in theory is different from actually validating it with some testing. So as long as this special handling in push gateway has been confirmed to work as expected with testing, then sounds good.

Note, maybe this approach works better when you have more access to interface to the push gateway, the library/interface we were using at the time, I'm not aware if it had a way to "delete" metrics on the push gateway, we only sent metrics to it.

but change my mind

I'm not sure what is meant here, it is a little vague for interpretation. Did you mean to say, unless someone can convince you to change your mind otherwise, pushgateway works fine for you? Or did you mean despite what you mentioned, you have decided to change your mind about using push gateway approach?

daluu avatar Feb 28 '24 19:02 daluu

@daluu I use "github.com/prometheus/client_golang/prometheus/push" to handle pusher.Delete() at the end of the test. And so he remove all created Metrics by the same pusher object.

My Grafana graph looks like this: image

As you see only at the time of attack there are data. So i think to advise against pushgateway (see readme) is incorrect.

fasibio avatar Mar 06 '24 09:03 fasibio

So i think to advise against pushgateway (see readme) is incorrect.

Yes, makes sense, I take back my prior advice, but with the caveat/warning that provided the user has properly utilized the push gateway logic. Because if you omit the delete step at the end, I believe you will run into the concern I previously mentioned, one can try to confirm it. So if we're building a solution and documentation here, need to account for that to ensure proper successful deployment.

@fasibio, curious what made you issue delete at the end of pushing metrics? Somehow you were aware of the need for this (or the issue when you don't delete), or you came across it from trial & error, or found it documented somewhere? Because unless I overlooked the documentation/example code, as far as I can recall I don't recall seeing the documentation or example code indicating to user to issue a delete after pushing out metrics. It's not so intuitive to me how the push gateway was designed, as one would think the push (and pull) model is discrete - you send/poll data, you get data. when no push or poll is occurring, then there should be no data - but the push gateway still holds on to it for continuous forwarding to prometheus if you don't clear it out specifically when the metric values don't change.

daluu avatar Mar 06 '24 18:03 daluu

@daluu Simple I follow the pushgateway "Use it" (no go specific) and there the CURL delete command is part of. https://github.com/prometheus/pushgateway?tab=readme-ov-file#use-it . And that is all.

@tsenart see discussion, might make sense to update readme.

fasibio avatar Mar 21 '24 09:03 fasibio