opentelemetry-collector-contrib icon indicating copy to clipboard operation
opentelemetry-collector-contrib copied to clipboard

Service name based routing for trace collector span metrics

Open aishyandapalli opened this issue 2 years ago • 14 comments

Description: Adding a feature support for service name based routing for loadbalancing exporters. Instead of repeating the same service name in all hosts for span metrics collector, this feature will have one-one mapping between a service and collector host. This will have traceId routing by default if nothing is mentioned. If, we need a service based routing, it should be added in loadbalancing exporter config as below

Closes #12652

routing_key: "service"

Testing: Added unit test cases to check different routing types for e.g service based vs non service based which is traceId by default

aishyandapalli avatar Jul 14 '22 01:07 aishyandapalli

@jpkrohling Thanks a lot for the review. Made the changes suggested and pushed.

aishyandapalli avatar Jul 20 '22 01:07 aishyandapalli

@albertteoh Thanks a lot for your review, it was really helpful. Update the PR addressing your comments. Thanks in advance!

aishyandapalli avatar Jul 20 '22 18:07 aishyandapalli

@jpkrohling @albertteoh Thanks again. Addressed the review comments

aishyandapalli avatar Jul 20 '22 23:07 aishyandapalli

@jpkrohling @albertteoh Thanks again. Addressed the review comments

Are you able to answer @jpkrohling's question: https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/12421#discussion_r925995997? I'm quite curious about how and why service-name routing is used.

albertteoh avatar Jul 21 '22 09:07 albertteoh

@albertteoh @jpkrohling we are currently using spanmetrics processor to generate red metrics and ingest into prometheus. these metrics are in the scope of a service+operation context. so if we either generate red metrics on a single tiered collector or 2 tiers with a load balancing exporter, each of the collectors would generate metrics for the service+operation name. this would cause label collisions in prometheus. to be able to avoid this we would like to be able to do collector 1 -> service based load balancing -> collector 2 where spanmetrics processor generates metrics. does this clarify?

vjsamuel avatar Jul 21 '22 18:07 vjsamuel

Yes, I guess it does make sense. Can you add this to the readme? I'm sure people will find it useful to understand why this exists in the first place.

jpkrohling avatar Jul 21 '22 18:07 jpkrohling

Thanks @vjsamuel for the clarification. @jpkrohling Updated the readme

aishyandapalli avatar Jul 21 '22 21:07 aishyandapalli

Looks good to me. I'll wait for @albertteoh's feedback before moving forward with this.

jpkrohling avatar Jul 22 '22 13:07 jpkrohling

does this clarify?

Yup, makes sense now, thanks @vjsamuel. I wonder if prometheus federation would help in this case (I have zero experience with it, but just heard about it mentioned a few times).

albertteoh avatar Jul 23 '22 12:07 albertteoh

I wonder if prometheus federation

at smaller scale maybe. but if we have several 100s of services, depending on what granularity the RED metrics are being generated by the processor, it will become harder and harder for prom fed to handle.

vjsamuel avatar Jul 25 '22 23:07 vjsamuel

@albertteoh or @jpkrohling Can you please help merge the PR?

aishyandapalli avatar Jul 27 '22 17:07 aishyandapalli

This would be very useful to me. Hopefully we see it in soon.

crobertson-conga avatar Jul 29 '22 18:07 crobertson-conga

@jpkrohling Thanks. I modified the readme. Can you please check and let me know anything else to be added in specific?

aishyandapalli avatar Aug 02 '22 00:08 aishyandapalli

@jpkrohling could you kindly help move this forwad? are there more approvals required if i may ask?

vjsamuel avatar Aug 04 '22 16:08 vjsamuel

Linting is failing:

Error: trace_exporter_test.go:189:18: Error return value of `p.Shutdown` is not checked (errcheck)
	defer p.Shutdown(context.Background())

jpkrohling avatar Aug 08 '22 17:08 jpkrohling

CLA Signed

The committers listed above are authorized under a signed CLA.

  • :white_check_mark: login: aishyandapalli (2f239114bdbe06796a6882a74be373ea0d915c9d, 315e30a1701f34fff1a7adf491c962ba7c789c5b, 9ecc1183af487289f8e0c02670de6961c10176b7, 0f10cd9a2278b969f68e9c72150231828de0e70f, bc5536aa5590a7dd5da29fa6678773a73b7a89ba, 8043b2ef67138ef3611894943519ba5c23cac90a, 73cb631ca6cea1fba3d04a2fba3821764bac4ac2, e6ccedd49e1f8135f7cd3e4515654705133e46a2, 905fa21bd7a747761aa1085cea88db57d4f0e7e4, 23a6839f4cf7389791c42ee7855eb5836cb96704, b59750297f82b9e8e6b4cd2a37ab2749b909f7b7, 8519b050d09917d4d1a86932ddb3793aca470516, b2168bc7ad65c2bce44a3774a8bf09571c8ad71a, ba60f5e940fc79ef8c11ae395e77d9946123c24f, 5ac761d1cdb09a7611e830402923a597a33da20b, 5b327ad616a17463fb9cc64c9a89da687f464d20, b4abe151795007ed13a208930d54f1c6772878df, 88bfb07d5245aaa62825cb2ba804a3c1b5426159, 95abef3e5b3f36b982eb2827ecd5515d4099fc0b)

@jpkrohling Thanks for verifying. Fixed the lint issue. Looks like all the builds are success now

aishyandapalli avatar Aug 10 '22 17:08 aishyandapalli