agent
agent copied to clipboard
initial ssl_exporter implementation
PR Description
Implements support for ribbybibby/ssl_exporter
Which issue(s) this PR fixes
#1670
Notes to the Reviewer
I wasn't able to pull down a recent version of ribbybibby/ssl_exporter. I forked it and regenerated the go.mod file which seemed to do the trick. So right now it's all pointed to my repo; I filed an issue over there and intend to keep this PR open for review in the meantime and update when fixed.
linter and tests passed for ssl_exporter on my end, but some tests are failing and I'm not sure if/how I introduced them. Manually tested with configs mentioned in the added docs + some I'm using personally.
PR Checklist
- [x] CHANGELOG updated
- [x] Documentation added
- [ ] Tests updated
Hey James 👋 thanks for taking the time to contribute to Grafana Agent!
I still haven't taken a proper look, but just wanted to mention the test failures are due to the upgrade of prometheus/common
. Commit https://github.com/prometheus/common/commit/252ff6f235cdd75735af7ddec99ac2f783326bff allowed users to explicitly enable/disable HTTP2, which appears as a new field in our configuration. I think we don't have an issue with it, but if it's an upgrade that came along a new dependency, we should probably update our tests to include this new field.
@tpaschalis doh! I thought those upgrades might have been it, I tried reverting some changes, but thanks for confirming 😅 Gonna get some coffee and I'll patch this up
amended the tests that were only missing enable_http2: true
. There still seems to be one more test failing in CI that I can't quite determine the cause. It passes for me locally:
go test -timeout 30s -run ^TestOTelConfig$ github.com/grafana/agent/pkg/traces
ok github.com/grafana/agent/pkg/traces
I ran a diff against expected and actual, but it turned up empty aside from the memory addresses which assert.Equal
shouldn't fail on to my understanding. But those diffs in the ci log don't seem to lie, I might be missing something small
oh I guess I was using the wrong flags, should have used make DOCKER_OPTS="" K8S_USE_DOCKER_NETWORK=1 DRONE=true BUILD_IN_CONTAINER=false test
. let me dig in a little more...
hm, I'm experiencing different results on subsequent runs without any changes. It fails pretty consistently in ci. I thought by replicating the go test command to match exactly what ci has, it would fail as expected and I could troubleshoot. And it did fail for a while, but just now started to randomly pass again without any changes to config_test.go relative to this branch.
ok github.com/grafana/agent/pkg/traces 11.269s coverage: 80.8% of statements
^this is from a package wide test
$ CGO_ENABLED=1 go test -ldflags "-s -w -X github.com/grafana/agent/pkg/build.Branch=main -X github.com/grafana/agent/pkg/build.Version=main-57cab84-WIP -X github.com/grafana/agent/pkg/build.Revision=57cab84e -X github.com/grafana/agent/pkg/[email protected] -X github.com/grafana/agent/pkg/build.BuildDate=2022-05-05T22:53:54Z" -tags "netgo" -race -cover -coverprofile=cover.out -p=4 -run ^TestOTelConfig$ github.com/grafana/agent/pkg/traces
ok github.com/grafana/agent/pkg/traces 11.831s coverage: 49.9% of statements
$ git diff pkg/traces/config_test.go
$
^this is from just that one test showing no changes to config_test.go
I think I might be missing something here
So I confirmed I wasn't going mad by running the test through a while loop. Out of 10 tries, on the 8th attempt the test failed. All other runs passed. I think what we're running into here is sort.Slice
The sort is not guaranteed to be stable: equal elements may be reversed from their original order. For a stable sort, use SliceStable.
~Changed to sort.SliceStable
hoping that does the trick. I was able to capture the expected and actual data structures in the event of an error and I can see that the exporters weren't always in the same order after being passed to sortPipelines
~
sorted the wrong thing. added assertConfigEqual
to compare the two regardless of order.
that test function assertConfigEqual
, while it does seem to do the trick, is really ugly. Hoping folks here who are more skilled in the ways of golang can show me a better way 😅 for clarity, the reason I created it is because the test TestOTelConfig/tail_sampling_config_with_load_balancing
was intermittently failing because the expectedConfig was not always in the same order as actualConfig. assertConfigEqual tries to sort everything in those objects before comparison
but at this point, go.mod's pointing to the right repo now so we should be good on that front
edit: I cleaned it up a little bit, but maybe there's a better way of achieving the overall behavior?
I'm playing around with the exporter, so more detailed comments are coming along, but a high-level note; I think we might be able to drop the v2 and use a "shim". Here's how we do it for other exporters for example and I think it will simplify things. WDYT?
yo @tpaschalis, I actually took a stab at this last night and today. ssl_exporter doesn't have an exporter the way other integrations do so it was really tricky to get this going. After removing the separate v2/
integration and setting up the shim, v1 worked fine, but v2 didn't do anything. So I tried my best making a mini exporter that collects everything from the registry and sends it to the channel that way. The tests for ssl_exporter itself pass on my end and I've tested this minimally against some local resources, so I'm just checking this in right now before I go too far down a rabbit hole 🙃
Hey man, sorry for the intermittent replies, most of the team was out of office these past two weeks.
I looked through your second approach, and you're right, the shim doesn't play well with multi-target exporters like this one as it doesn't propagate the query params.
I'm cc-ing @mattdurham for his opinion here on what course to follow before committing to something :) I think v1 is on a good path, just making sure which way to go for v2.
Apologies, was at the offsite last week and spent the beginning of the week battling a thankfully just a cold. I have this on my radar to take a look at today.
Hey no worries at all. Hope y'all had fun at the offsite and that you're feeling better Matt! Let me know where you want to go with this and I'm happy to add/remove where necessary.
@jamesalbert should have a branch with some changes for V2 tomorrow for you to take a look at
By tomorrow I mean today, ended up making some significant changes. Porting the changes into a v2 integration now
@jamesalbert created a branch https://github.com/grafana/agent/tree/ssl_exporter with my changes, its a bit rough and I want to cleanup, see what we can share between v1/v2 but please give a look.
yo @mattdurham all that sounds good. How do y'all want to continue with this? Should I merge in that branch into my fork's main? Or do you want to hold off until further cleanup?
The first thing that stands out to me though is ssl_exporter.go in both v1 and v2. Is there anyway we can share between the two? I haven't gotten a chance to fully dig in yet so I'm not sure if you adjusted anything in v2's ssl_exporter.go, it looks like an added file to me at this point, but I'll try getting some time in here soon
There are differences due to how v2/v1 work, that being said we can probably simplify it. I can take a spin at it tomorrow morning and see what I can simplify on my end.
This PR has been automatically marked as stale because it has not had any activity in the past 30 days. The next time this stale check runs, the stale label will be removed if there is new activity. The issue will be closed in 7 days if there is no new activity. Thank you for your contributions!
haven't abandoned this :) other things popped up. Will revisit soon
This PR has been automatically marked as stale because it has not had any activity in the past 30 days. The next time this stale check runs, the stale label will be removed if there is new activity. The issue will be closed in 7 days if there is no new activity. Thank you for your contributions!
This PR has been automatically marked as stale because it has not had any activity in the past 30 days. The next time this stale check runs, the stale label will be removed if there is new activity. The issue will be closed in 7 days if there is no new activity. Thank you for your contributions!