opentelemetry-collector-contrib Service name based routing for trace collector span metrics

Description: Adding a feature support for service name based routing for loadbalancing exporters. Instead of repeating the same service name in all hosts for span metrics collector, this feature will have one-one mapping between a service and collector host. This will have traceId routing by default if nothing is mentioned. If, we need a service based routing, it should be added in loadbalancing exporter config as below

Closes #12652

routing_key: "service"

Testing: Added unit test cases to check different routing types for e.g service based vs non service based which is traceId by default

Jul 14 '22 01:07 aishyandapalli

@jpkrohling Thanks a lot for the review. Made the changes suggested and pushed.

Jul 20 '22 01:07 aishyandapalli

@albertteoh Thanks a lot for your review, it was really helpful. Update the PR addressing your comments. Thanks in advance!

Jul 20 '22 18:07 aishyandapalli

@jpkrohling @albertteoh Thanks again. Addressed the review comments

Jul 20 '22 23:07 aishyandapalli

@jpkrohling @albertteoh Thanks again. Addressed the review comments

Are you able to answer @jpkrohling's question: https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/12421#discussion_r925995997? I'm quite curious about how and why service-name routing is used.

Jul 21 '22 09:07 albertteoh

@albertteoh @jpkrohling we are currently using spanmetrics processor to generate red metrics and ingest into prometheus. these metrics are in the scope of a service+operation context. so if we either generate red metrics on a single tiered collector or 2 tiers with a load balancing exporter, each of the collectors would generate metrics for the service+operation name. this would cause label collisions in prometheus. to be able to avoid this we would like to be able to do collector 1 -> service based load balancing -> collector 2 where spanmetrics processor generates metrics. does this clarify?

Jul 21 '22 18:07 vjsamuel

Yes, I guess it does make sense. Can you add this to the readme? I'm sure people will find it useful to understand why this exists in the first place.

Jul 21 '22 18:07 jpkrohling

Thanks @vjsamuel for the clarification. @jpkrohling Updated the readme

Jul 21 '22 21:07 aishyandapalli

Looks good to me. I'll wait for @albertteoh's feedback before moving forward with this.

Jul 22 '22 13:07 jpkrohling

does this clarify?

Yup, makes sense now, thanks @vjsamuel. I wonder if prometheus federation would help in this case (I have zero experience with it, but just heard about it mentioned a few times).

Jul 23 '22 12:07 albertteoh

I wonder if prometheus federation

at smaller scale maybe. but if we have several 100s of services, depending on what granularity the RED metrics are being generated by the processor, it will become harder and harder for prom fed to handle.

Jul 25 '22 23:07 vjsamuel

@albertteoh or @jpkrohling Can you please help merge the PR?

Jul 27 '22 17:07 aishyandapalli

This would be very useful to me. Hopefully we see it in soon.

Jul 29 '22 18:07 crobertson-conga

@jpkrohling Thanks. I modified the readme. Can you please check and let me know anything else to be added in specific?

Aug 02 '22 00:08 aishyandapalli

@jpkrohling could you kindly help move this forwad? are there more approvals required if i may ask?

Aug 04 '22 16:08 vjsamuel

Linting is failing:

Error: trace_exporter_test.go:189:18: Error return value of `p.Shutdown` is not checked (errcheck)
	defer p.Shutdown(context.Background())

Aug 08 '22 17:08 jpkrohling

The committers listed above are authorized under a signed CLA.

:white_check_mark: login: aishyandapalli (2f239114bdbe06796a6882a74be373ea0d915c9d, 315e30a1701f34fff1a7adf491c962ba7c789c5b, 9ecc1183af487289f8e0c02670de6961c10176b7, 0f10cd9a2278b969f68e9c72150231828de0e70f, bc5536aa5590a7dd5da29fa6678773a73b7a89ba, 8043b2ef67138ef3611894943519ba5c23cac90a, 73cb631ca6cea1fba3d04a2fba3821764bac4ac2, e6ccedd49e1f8135f7cd3e4515654705133e46a2, 905fa21bd7a747761aa1085cea88db57d4f0e7e4, 23a6839f4cf7389791c42ee7855eb5836cb96704, b59750297f82b9e8e6b4cd2a37ab2749b909f7b7, 8519b050d09917d4d1a86932ddb3793aca470516, b2168bc7ad65c2bce44a3774a8bf09571c8ad71a, ba60f5e940fc79ef8c11ae395e77d9946123c24f, 5ac761d1cdb09a7611e830402923a597a33da20b, 5b327ad616a17463fb9cc64c9a89da687f464d20, b4abe151795007ed13a208930d54f1c6772878df, 88bfb07d5245aaa62825cb2ba804a3c1b5426159, 95abef3e5b3f36b982eb2827ecd5515d4099fc0b)

Aug 09 '22 18:08 linux-foundation-easycla[bot]

@jpkrohling Thanks for verifying. Fixed the lint issue. Looks like all the builds are success now

Aug 10 '22 17:08 aishyandapalli

opentelemetry-collector-contrib opentelemetry-collector-contrib copied to clipboard

Service name based routing for trace collector span metrics

opentelemetry-collector-contrib
opentelemetry-collector-contrib copied to clipboard