opentelemetry-dotnet-contrib
opentelemetry-dotnet-contrib copied to clipboard
Add ConfluentKafka instrumentation
Fixes #1485.
Changes
Add an instrumentation package for Confluent.Kafka library. Implement tracing and propagation through Kafka message headers Implement metrics
For significant contributions please make sure you have completed the following items:
- [ ] Appropriate
CHANGELOG.md
updated for non-trivial changes - [ ] Design discussion issue #
- [ ] Changes in public API reviewed
The committers listed above are authorized under a signed CLA.
- :white_check_mark: login: g7ed6e / name: Guillaume Delahaye (9d0122cd9a5589e2c28986d3ad6d47456201865f, bae86e91c9215ff099be92636a0c6da1efefeb3b, b429a6f44ea1601a6a4a9dedc123053b33b462e0, 5afe9e052ed64fe50e4c1bace90c43c977a5d47e)
- :white_check_mark: login: CodeBlanch / name: Mikel Blanchard (f3c4df8b5e88dcee7899297011b71478f98c9148, cdd2470cf59ee5fb12049ef2acf038b0f3a44040, a4fc992e507bea855489b7034b8129a8ec0c6120, 9be7c1874067bfb2082f526895b4498110a797b5, 400fdc15ccc3521bac45b6fa64f873dd402c77b9, 4256bb1fba061e0f98a720aa7ace9f3a0e794fe8)
- :white_check_mark: login: vishweshbankwar / name: Vishwesh Bankwar (0f5bd952cb8a6977d07c4429cb8750f65cf4346c)
- :white_check_mark: login: Kielek / name: Piotr Kiełkowicz (14ad8c839d5f3bb332dd1de807e64244c4ca810f)
On the Tracing side, would it be better if we just finished up https://github.com/confluentinc/confluent-kafka-dotnet/pull/1838 (and also made a similar PR for Consumer)? That way we wouldn't need to wrap IProducer / IConsumer. The OpenTelemetry package would just enable the ActivitySource from the Kafka library.
cc @cijothomas
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 72.40%. Comparing base (
71655ce
) to head (14ad8c8
). Report is 349 commits behind head on main.
Additional details and impacted files
@@ Coverage Diff @@
## main #1493 +/- ##
==========================================
- Coverage 73.91% 72.40% -1.51%
==========================================
Files 267 304 +37
Lines 9615 11318 +1703
==========================================
+ Hits 7107 8195 +1088
- Misses 2508 3123 +615
Flags with carried forward coverage won't be shown. Click here to find out more.
On the Tracing side, would it be better if we just finished up confluentinc/confluent-kafka-dotnet#1838 (and also made a similar PR for Consumer)? That way we wouldn't need to wrap IProducer / IConsumer. The OpenTelemetry package would just enable the ActivitySource from the Kafka library.
cc @cijothomas
Yes. If that PR finishes, then there may not be even a need of an instrumentation library here, as users can just add AddSource themselves.
On the Tracing side, would it be better if we just finished up confluentinc/confluent-kafka-dotnet#1838 (and also made a similar PR for Consumer)? That way we wouldn't need to wrap IProducer / IConsumer. The OpenTelemetry package would just enable the ActivitySource from the Kafka library.
cc @cijothomas
Yes. If that PR finishes, then there may not be even a need of an instrumentation library here, as users can just add AddSource themselves.
@eerhardt @cijothomas I agree but I think until that it would be great to get tracing support.
On the Tracing side, would it be better if we just finished up confluentinc/confluent-kafka-dotnet#1838 (and also made a similar PR for Consumer)? That way we wouldn't need to wrap IProducer / IConsumer. The OpenTelemetry package would just enable the ActivitySource from the Kafka library.
cc @cijothomas
Yes. If that PR finishes, then there may not be even a need of an instrumentation library here, as users can just add AddSource themselves.
@eerhardt @cijothomas I agree but I think until that it would be great to get tracing support.
I am not quite sure if the efforts of maintaining a library is worth it, unless we know that the native tracing support will take a very long time. If it is coming in few months, wouldn't that be better to wait?
On the Tracing side, would it be better if we just finished up confluentinc/confluent-kafka-dotnet#1838 (and also made a similar PR for Consumer)? That way we wouldn't need to wrap IProducer / IConsumer. The OpenTelemetry package would just enable the ActivitySource from the Kafka library.
cc @cijothomas
Yes. If that PR finishes, then there may not be even a need of an instrumentation library here, as users can just add AddSource themselves.
@eerhardt @cijothomas I agree but I think until that it would be great to get tracing support.
I am not quite sure if the efforts of maintaining a library is worth it, unless we know that the native tracing support will take a very long time. If it is coming in few months, wouldn't that be better to wait?
@cijothomas Having a look back to the original issue https://github.com/confluentinc/confluent-kafka-dotnet/issues/1269 I notice PRs to auto instrument Kafka has been linked here https://github.com/open-telemetry/opentelemetry-dotnet-instrumentation/issues/2854. I think it could make sense to offer a per producer / per consumer optin approach that fit dotnet/aspire integration https://github.com/dotnet/aspire/pull/951 which is the primary motivation of this contribution. Notice that implementation from auto instrumentation and the current PR are not aligned but it could be easily done to offer consistency.
This PR was marked stale due to lack of activity. It will be closed in 7 days.
On the Tracing side, would it be better if we just finished up confluentinc/confluent-kafka-dotnet#1838 (and also made a similar PR for Consumer)? That way we wouldn't need to wrap IProducer / IConsumer. The OpenTelemetry package would just enable the ActivitySource from the Kafka library.
cc @cijothomas
Yes. If that PR finishes, then there may not be even a need of an instrumentation library here, as users can just add AddSource themselves.
@eerhardt @cijothomas I agree but I think until that it would be great to get tracing support.
I am not quite sure if the efforts of maintaining a library is worth it, unless we know that the native tracing support will take a very long time. If it is coming in few months, wouldn't that be better to wait?
@cijothomas Having a look back to the original issue confluentinc/confluent-kafka-dotnet#1269 I notice PRs to auto instrument Kafka has been linked here open-telemetry/opentelemetry-dotnet-instrumentation#2854. I think it could make sense to offer a per producer / per consumer optin approach that fit dotnet/aspire integration dotnet/aspire#951 which is the primary motivation of this contribution. Notice that implementation from auto instrumentation and the current PR are not aligned but it could be easily done to offer consistency.
Hi @cijothomas. May I ask for a review ? What do you think about maintaining this code until official support comes out, then the library can be deprecated ?
Thanks for submitting this. I'm reviewing mostly from the point of view of a contributor to messaging semantic conventions.
Hi @pyohannes Thank you for your review. I should be able to get back to this and incorporate your suggestions next week.
On the Tracing side, would it be better if we just finished up confluentinc/confluent-kafka-dotnet#1838 (and also made a similar PR for Consumer)? That way we wouldn't need to wrap IProducer / IConsumer. The OpenTelemetry package would just enable the ActivitySource from the Kafka library.
cc @cijothomas
Yes. If that PR finishes, then there may not be even a need of an instrumentation library here, as users can just add AddSource themselves.
@eerhardt @cijothomas I agree but I think until that it would be great to get tracing support.
I am not quite sure if the efforts of maintaining a library is worth it, unless we know that the native tracing support will take a very long time. If it is coming in few months, wouldn't that be better to wait?
@cijothomas Having a look back to the original issue confluentinc/confluent-kafka-dotnet#1269 I notice PRs to auto instrument Kafka has been linked here open-telemetry/opentelemetry-dotnet-instrumentation#2854. I think it could make sense to offer a per producer / per consumer optin approach that fit dotnet/aspire integration dotnet/aspire#951 which is the primary motivation of this contribution. Notice that implementation from auto instrumentation and the current PR are not aligned but it could be easily done to offer consistency.
Hi @cijothomas. May I ask for a review ? What do you think about maintaining this code until official support comes out, then the library can be deprecated ?
Do you know any rough timelines when the official support comes ? If it is in less than 6 months, the maintainence+deprecation overhead here won't feel like worth.. If its an year or so at least before official support comes, it is okay to keep this PR moving.
On the Tracing side, would it be better if we just finished up confluentinc/confluent-kafka-dotnet#1838 (and also made a similar PR for Consumer)? That way we wouldn't need to wrap IProducer / IConsumer. The OpenTelemetry package would just enable the ActivitySource from the Kafka library.
cc @cijothomas
Yes. If that PR finishes, then there may not be even a need of an instrumentation library here, as users can just add AddSource themselves.
@eerhardt @cijothomas I agree but I think until that it would be great to get tracing support.
I am not quite sure if the efforts of maintaining a library is worth it, unless we know that the native tracing support will take a very long time. If it is coming in few months, wouldn't that be better to wait?
@cijothomas Having a look back to the original issue confluentinc/confluent-kafka-dotnet#1269 I notice PRs to auto instrument Kafka has been linked here open-telemetry/opentelemetry-dotnet-instrumentation#2854. I think it could make sense to offer a per producer / per consumer optin approach that fit dotnet/aspire integration dotnet/aspire#951 which is the primary motivation of this contribution. Notice that implementation from auto instrumentation and the current PR are not aligned but it could be easily done to offer consistency.
Hi @cijothomas. May I ask for a review ? What do you think about maintaining this code until official support comes out, then the library can be deprecated ?
Do you know any rough timelines when the official support comes ? If it is in less than 6 months, the maintainence+deprecation overhead here won't feel like worth.. If its an year or so at least before official support comes, it is okay to keep this PR moving.
I have no idea. The issue tracking this feature is https://github.com/confluentinc/confluent-kafka-dotnet/issues/1269 and is not updated. It looks like this feature is not planned yet.
This PR was marked stale due to lack of activity. It will be closed in 7 days.
@g7ed6e Is it possible for you to follow up with owners on the status of https://github.com/confluentinc/confluent-kafka-dotnet/issues/1269 once more? Having the instrumentation supported natively is the ideal scenario and many libraries are moving towards that. If the feature is not planned then we can consider adding this instrumentation here.
Note that you would need to follow the process outlined to get started. Also could you clarify who will be the owner(s) for this component.
cc: @open-telemetry/dotnet-contrib-maintainers
This PR was marked stale due to lack of activity. It will be closed in 7 days.
Closed as inactive. Feel free to reopen if this PR is still being worked on.
@vishweshbankwar / @open-telemetry/dotnet-contrib-maintainers
May i ask you to reopen this PR ? I'm currently still following up the status with owners.
This PR was marked stale due to lack of activity. It will be closed in 7 days.
Closed as inactive. Feel free to reopen if this PR is still being worked on.
@vishweshbankwar @cijothomas latest move in https://github.com/confluentinc/confluent-kafka-dotnet/issues/1269 states that tracing won't be covered. Should we reconsider adding tracing through this repo?
@vishweshbankwar @cijothomas latest move in confluentinc/confluent-kafka-dotnet#1269 states that tracing won't be covered. Should we reconsider adding tracing through this repo?
@g7ed6e - Thanks for following up.
Should we reconsider adding tracing through this repo?
Sounds good to me. Assuming you would be owing this package?
@vishweshbankwar @cijothomas latest move in confluentinc/confluent-kafka-dotnet#1269 states that tracing won't be covered. Should we reconsider adding tracing through this repo?
@g7ed6e - Thanks for following up.
Should we reconsider adding tracing through this repo?
Sounds good to me. Assuming you would be owing this package?
Thanks @vishweshbankwar. I rebased the branch and joined the slack otel-dotnet channel.
@Kielek I'm having issues sharing services (InstrumentedConsumerBuilder<TKey, TValue>
, InstrumentedProducerBuilder<TKey, TValue>
and their respective named options) between IServiceProvider
instances of MeterProviderBuilder
and TracerProviderBuilder
.. it looks like each builder has its own IServiceCollection
, I'm thinking to move InstrumentedXXXBuilder<TKey, TValue>
instances in static instances of ConcurrentDictionary<(string name, Type KeyType, Type ValueType), XXXBuilder<TKey, TValue>>
. i will give this a try.
edit: @CodeBlanch is there a way to share services between MeterProviderBuilder
and TracerProviderBuilder
?
@g7ed6e
edit: @CodeBlanch is there a way to share services between
MeterProviderBuilder
andTracerProviderBuilder
?
The short answer is: No
The long answer is: Depends
For users doing...
appBuilder.Services.AddOpenTelemetry()
.WithTracing(tracing => tracing.AddKafkaConsumerInstrumentation())
.WithMetrics(metrics => metrics.AddKafkaConsumerInstrumentation());
The IServiceCollection
+ IServiceProvider
will be the one tied to host and everything is shared.
For users doing...
using var tracerProvider = Sdk.CreateTracerProviderBuilder()
.AddKafkaConsumerInstrumentation()
.Build();
using var meterProvider = Sdk.CreateMeterProviderBuilder()
.AddKafkaConsumerInstrumentation()
.Build();
They are independent.
Hopefully there will be a new API soon (https://github.com/open-telemetry/opentelemetry-dotnet/pull/5325) for that second case which will also be shared:
using var openTelemetry = OpenTelemetrySdk.Create(builder => builder
.WithTracing(tracing => tracing.AddKafkaConsumerInstrumentation())
.WithMetrics(metrics => metrics.AddKafkaConsumerInstrumentation()));
If you really need them to be singletons then you'll have to do something static
. If having singletons is just a nice-to-have I would just use the IServiceProvider
normally and let it sort out what to do.
Remember users can set up multiple pipelines:
using var tracerProvider1 = Sdk.CreateTracerProviderBuilder()
.AddSource("SomeHighPrioritySource")
.AddKafkaConsumerInstrumentation()
.AddOtlpExporter(...send one place...)
.Build();
using var tracerProvider2 = Sdk.CreateTracerProviderBuilder()
.AddSource("SomeLowPrioritySource")
.AddKafkaConsumerInstrumentation()
.AddZipkinExporter(...send some other place...)
.Build();
In which case would having static singletons actually be a problem?
@g7ed6e I pushed some changes to try and get this over the finish line 😄
-
Missing
MinVerTagPrefix
in the.proj
: https://github.com/open-telemetry/opentelemetry-dotnet-contrib/pull/1493/commits/f3c4df8b5e88dcee7899297011b71478f98c9148 -
Static meters/counters: https://github.com/open-telemetry/opentelemetry-dotnet-contrib/pull/1493/commits/a4fc992e507bea855489b7034b8129a8ec0c6120
The way it was before users could end up with multiple meters in their process with the same name & version. That is a no-no in the OTel spec. SDK should actually be fine with it but it seemed simpler to just make them static.
Since I was in there I also switched the measurements to use
TagList
so they become allocation free. Before for every measurement it was allocating an enumerator and array. Expensive! -
Lit up the integration tests in CI: https://github.com/open-telemetry/opentelemetry-dotnet-contrib/pull/1493/commits/400fdc15ccc3521bac45b6fa64f873dd402c77b9
The tests are currently failing. Would you please look into that?
@Kielek @vishweshbankwar Some concerns with the tests being added on this PR. There are no unit tests, only integration tests. Usually we have a mix of both? Not a blocker for me so long as the CI runs what is there but I think ideally there would be some unit tests too. Could be a follow-up effort.
@g7ed6e I pushed some changes to try and get this over the finish line 😄
- Missing
MinVerTagPrefix
in the.proj
: f3c4df8- Static meters/counters: a4fc992 The way it was before users could end up with multiple meters in their process with the same name & version. That is a no-no in the OTel spec. SDK should actually be fine with it but it seemed simpler to just make them static. Since I was in there I also switched the measurements to use
TagList
so they become allocation free. Before for every measurement it was allocating an enumerator and array. Expensive!- Lit up the integration tests in CI: 400fdc1 The tests are currently failing. Would you please look into that?
@Kielek @vishweshbankwar Some concerns with the tests being added on this PR. There are no unit tests, only integration tests. Usually we have a mix of both? Not a blocker for me so long as the CI runs what is there but I think ideally there would be some unit tests too. Could be a follow-up effort.
@CodeBlanch I pulled your changes and pushed updates to the integration tests.
@g7ed6e Seems like the integration tests are still having an issue: https://github.com/open-telemetry/opentelemetry-dotnet-contrib/actions/runs/9683812113/job/26720211596?pr=1493
@g7ed6e Seems like the integration tests are still having an issue: https://github.com/open-telemetry/opentelemetry-dotnet-contrib/actions/runs/9683812113/job/26720211596?pr=1493
~~@CodeBlanch it looks like multiple Kafka containers are created. I don't face this issue locally I will check CI scripts.~~
@CodeBlanch : I fixed and ran the integration tests locally using the compose file and that's ok. It should be fine in CI too.
I do still think we could do more to help connect traces when consuming (#1493 (comment)) but LGTM that could be done as follow-up work.
@CodeBlanch I missed that comment. I propose to implement it in a follow up PR.
@Kielek @vishweshbankwar I think this is good to go but since it is touching a lot of the repo I would like another maintainer to approve before merging.