agent icon indicating copy to clipboard operation
agent copied to clipboard

initial ssl_exporter implementation

Open jamesalbert opened this issue 2 years ago • 19 comments

PR Description

Implements support for ribbybibby/ssl_exporter

Which issue(s) this PR fixes

#1670

Notes to the Reviewer

I wasn't able to pull down a recent version of ribbybibby/ssl_exporter. I forked it and regenerated the go.mod file which seemed to do the trick. So right now it's all pointed to my repo; I filed an issue over there and intend to keep this PR open for review in the meantime and update when fixed.

linter and tests passed for ssl_exporter on my end, but some tests are failing and I'm not sure if/how I introduced them. Manually tested with configs mentioned in the added docs + some I'm using personally.

PR Checklist

  • [x] CHANGELOG updated
  • [x] Documentation added
  • [ ] Tests updated

jamesalbert avatar May 05 '22 07:05 jamesalbert

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar May 05 '22 08:05 CLAassistant

Hey James 👋 thanks for taking the time to contribute to Grafana Agent!

I still haven't taken a proper look, but just wanted to mention the test failures are due to the upgrade of prometheus/common. Commit https://github.com/prometheus/common/commit/252ff6f235cdd75735af7ddec99ac2f783326bff allowed users to explicitly enable/disable HTTP2, which appears as a new field in our configuration. I think we don't have an issue with it, but if it's an upgrade that came along a new dependency, we should probably update our tests to include this new field.

tpaschalis avatar May 05 '22 13:05 tpaschalis

@tpaschalis doh! I thought those upgrades might have been it, I tried reverting some changes, but thanks for confirming 😅 Gonna get some coffee and I'll patch this up

jamesalbert avatar May 05 '22 16:05 jamesalbert

amended the tests that were only missing enable_http2: true. There still seems to be one more test failing in CI that I can't quite determine the cause. It passes for me locally:

go test -timeout 30s -run ^TestOTelConfig$ github.com/grafana/agent/pkg/traces
ok      github.com/grafana/agent/pkg/traces

I ran a diff against expected and actual, but it turned up empty aside from the memory addresses which assert.Equal shouldn't fail on to my understanding. But those diffs in the ci log don't seem to lie, I might be missing something small

oh I guess I was using the wrong flags, should have used make DOCKER_OPTS="" K8S_USE_DOCKER_NETWORK=1 DRONE=true BUILD_IN_CONTAINER=false test. let me dig in a little more...

jamesalbert avatar May 05 '22 22:05 jamesalbert

hm, I'm experiencing different results on subsequent runs without any changes. It fails pretty consistently in ci. I thought by replicating the go test command to match exactly what ci has, it would fail as expected and I could troubleshoot. And it did fail for a while, but just now started to randomly pass again without any changes to config_test.go relative to this branch.

ok      github.com/grafana/agent/pkg/traces     11.269s coverage: 80.8% of statements

^this is from a package wide test

$ CGO_ENABLED=1 go test -ldflags "-s -w -X github.com/grafana/agent/pkg/build.Branch=main -X github.com/grafana/agent/pkg/build.Version=main-57cab84-WIP -X github.com/grafana/agent/pkg/build.Revision=57cab84e -X github.com/grafana/agent/pkg/[email protected] -X github.com/grafana/agent/pkg/build.BuildDate=2022-05-05T22:53:54Z" -tags "netgo"  -race -cover -coverprofile=cover.out -p=4  -run ^TestOTelConfig$ github.com/grafana/agent/pkg/traces
ok      github.com/grafana/agent/pkg/traces     11.831s coverage: 49.9% of statements

$ git diff pkg/traces/config_test.go
$

^this is from just that one test showing no changes to config_test.go

I think I might be missing something here

jamesalbert avatar May 05 '22 23:05 jamesalbert

So I confirmed I wasn't going mad by running the test through a while loop. Out of 10 tries, on the 8th attempt the test failed. All other runs passed. I think what we're running into here is sort.Slice

The sort is not guaranteed to be stable: equal elements may be reversed from their original order. For a stable sort, use SliceStable.

~Changed to sort.SliceStable hoping that does the trick. I was able to capture the expected and actual data structures in the event of an error and I can see that the exporters weren't always in the same order after being passed to sortPipelines~

sorted the wrong thing. added assertConfigEqual to compare the two regardless of order.

jamesalbert avatar May 06 '22 01:05 jamesalbert

that test function assertConfigEqual, while it does seem to do the trick, is really ugly. Hoping folks here who are more skilled in the ways of golang can show me a better way 😅 for clarity, the reason I created it is because the test TestOTelConfig/tail_sampling_config_with_load_balancing was intermittently failing because the expectedConfig was not always in the same order as actualConfig. assertConfigEqual tries to sort everything in those objects before comparison

but at this point, go.mod's pointing to the right repo now so we should be good on that front

edit: I cleaned it up a little bit, but maybe there's a better way of achieving the overall behavior?

jamesalbert avatar May 07 '22 21:05 jamesalbert

I'm playing around with the exporter, so more detailed comments are coming along, but a high-level note; I think we might be able to drop the v2 and use a "shim". Here's how we do it for other exporters for example and I think it will simplify things. WDYT?

yo @tpaschalis, I actually took a stab at this last night and today. ssl_exporter doesn't have an exporter the way other integrations do so it was really tricky to get this going. After removing the separate v2/ integration and setting up the shim, v1 worked fine, but v2 didn't do anything. So I tried my best making a mini exporter that collects everything from the registry and sends it to the channel that way. The tests for ssl_exporter itself pass on my end and I've tested this minimally against some local resources, so I'm just checking this in right now before I go too far down a rabbit hole 🙃

jamesalbert avatar May 14 '22 01:05 jamesalbert

Hey man, sorry for the intermittent replies, most of the team was out of office these past two weeks.

I looked through your second approach, and you're right, the shim doesn't play well with multi-target exporters like this one as it doesn't propagate the query params.

I'm cc-ing @mattdurham for his opinion here on what course to follow before committing to something :) I think v1 is on a good path, just making sure which way to go for v2.

tpaschalis avatar May 19 '22 09:05 tpaschalis

Apologies, was at the offsite last week and spent the beginning of the week battling a thankfully just a cold. I have this on my radar to take a look at today.

mattdurham avatar May 19 '22 13:05 mattdurham

Hey no worries at all. Hope y'all had fun at the offsite and that you're feeling better Matt! Let me know where you want to go with this and I'm happy to add/remove where necessary.

jamesalbert avatar May 19 '22 17:05 jamesalbert

@jamesalbert should have a branch with some changes for V2 tomorrow for you to take a look at

mattdurham avatar May 19 '22 19:05 mattdurham

By tomorrow I mean today, ended up making some significant changes. Porting the changes into a v2 integration now

mattdurham avatar May 23 '22 15:05 mattdurham

@jamesalbert created a branch https://github.com/grafana/agent/tree/ssl_exporter with my changes, its a bit rough and I want to cleanup, see what we can share between v1/v2 but please give a look.

mattdurham avatar May 23 '22 16:05 mattdurham

yo @mattdurham all that sounds good. How do y'all want to continue with this? Should I merge in that branch into my fork's main? Or do you want to hold off until further cleanup?

The first thing that stands out to me though is ssl_exporter.go in both v1 and v2. Is there anyway we can share between the two? I haven't gotten a chance to fully dig in yet so I'm not sure if you adjusted anything in v2's ssl_exporter.go, it looks like an added file to me at this point, but I'll try getting some time in here soon

jamesalbert avatar May 24 '22 19:05 jamesalbert

There are differences due to how v2/v1 work, that being said we can probably simplify it. I can take a spin at it tomorrow morning and see what I can simplify on my end.

mattdurham avatar May 24 '22 20:05 mattdurham

This PR has been automatically marked as stale because it has not had any activity in the past 30 days. The next time this stale check runs, the stale label will be removed if there is new activity. The issue will be closed in 7 days if there is no new activity. Thank you for your contributions!

github-actions[bot] avatar Jun 25 '22 00:06 github-actions[bot]

haven't abandoned this :) other things popped up. Will revisit soon

jamesalbert avatar Jun 29 '22 21:06 jamesalbert

This PR has been automatically marked as stale because it has not had any activity in the past 30 days. The next time this stale check runs, the stale label will be removed if there is new activity. The issue will be closed in 7 days if there is no new activity. Thank you for your contributions!

github-actions[bot] avatar Jul 31 '22 00:07 github-actions[bot]

This PR has been automatically marked as stale because it has not had any activity in the past 30 days. The next time this stale check runs, the stale label will be removed if there is new activity. The issue will be closed in 7 days if there is no new activity. Thank you for your contributions!

github-actions[bot] avatar Sep 01 '22 00:09 github-actions[bot]