opentelemetry-java-instrumentation icon indicating copy to clipboard operation
opentelemetry-java-instrumentation copied to clipboard

Implementation for the latest Messaging Semantic conventions.

Open cbos opened this issue 10 months ago • 1 comments

Update Messaging span with the latest semantic conventions. https://github.com/open-telemetry/semantic-conventions/blob/v1.30.0/docs/messaging/messaging-spans.md

By default nothing changes, it adheres to this:

  • SHOULD introduce an environment variable OTEL_SEMCONV_STABILITY_OPT_IN in the existing major version which is a comma-separated list of values. The list of values includes:
    • messaging - emit the new, stable messaging conventions, and stop emitting the old experimental messaging conventions that the instrumentation emitted previously.
    • messaging/dup - emit both the old and the stable messaging conventions, allowing for a seamless transition.
    • The default behavior (in the absence of one of these values) is to continue emitting whatever version of the old experimental messaging conventions the instrumentation was emitting previously.
    • Note: messaging/dup has higher precedence than messaging in case both values are present

cbos avatar Feb 04 '25 15:02 cbos

CLA Signed

The committers listed above are authorized under a signed CLA.

  • :white_check_mark: login: cbos / name: Cees Bos (f9d51fa90fd62c88ec8e7a753d053accbadd57e7, 91ae29bc5149325bfdfa0ec50d69ff711a836942, f18f9819a08d445edd23426a6fc469b75fcd980d, 802882a0827e606ad2f1b12d5ac9ae7a716a80f1, 1cf86e5c8c86209c0b95b9999801fa3802dc79b7, d1594a1886031608944fbf7f700f94f178cb19dd, 118bf1bf4b8520c25d05273bb8860ecaaf6793e1, 27d3a1494d5022105f01a97ceea6ec84e28fcc4f, eab19039d020cbaacaf0d5dce779bccff7881ca7, bacd3d337248ed02c07f0e9f1a22ddee4b6208b6, 3ecd05d2cd1b988e689fa4e04a91a068889e75db, 138cce0f2f4f046bdd1750579b4b466b98d56ddf, b2dd7c5c91d40658b8b3e27387bc195f41bb5999, 79c0d1ee4b794d898e1810c4d8a1bcede4928f8e, c3d09aa9d8275e715d08daec464e4a616e225d3a, aa83bad13c4f279fbc4541dfd8b31769b2fa5be4, a086a829f1613bb3e707ba4320a45a2fd8693d82, 63bc58b20cea35616341adbcfc43eeb316d8bcc3, 067e7a390b35be8a361f00091a62135d60811f24, 4a3ac25d99c34e1414aeeffa7800dce019141c9a, 1259752b95b231f7cc70c259db124158804bd76a, 7d266664e3d095ecea4dd2331ff0313cf4b8264a, 851d6682e7abadb5805980e809107bfe62b23335, fcfd92b60950de995a8584556c8784a0f55e767c, b1edd9651e7f8f7f08f20d07d85d57a1398a0350, eca37c79a0a585febcbd9b37e140409b007e9ea5, 18f46bc16cf0338f823272f936fcd7fbd4f1ce7f, fcaa3ca16b428d800db1b752ec5b6eb5497ae758, 44f55f2723c9f28fb343d20141286f544d57972a, b7a4cb6d50bffd73ca58c3b4265383bc61176481)

@cbos are you doing this for fun or do you intend to get this merged?

laurit avatar Mar 24 '25 12:03 laurit

@laurit Thanks for reaching out. My intent is to get the merged, but I had difficulties to find out how the CLA should be handled. That took an awful lot of time to get that sorted out in the organisation I am currently active, but that is arranged now. I signed the CLA.

cbos avatar Mar 31 '25 13:03 cbos

@laurit Thanks for reaching out. My intent is to get the merged, but I had difficulties to find out how the CLA should be handled. That took an awefull lot of time to get that sorted out in the organisation I am currently active, but that is arranged now. I signed the CLA.

cbos avatar Mar 31 '25 13:03 cbos

@laurit Thanks for the review, next week I will go through all details and update the PR

cbos avatar Apr 04 '25 10:04 cbos

@cbos do you need help getting this over the finishing line?

zeitlinger avatar May 27 '25 13:05 zeitlinger

@zeitlinger Thanks for reaching out. Due to priority incidents on the project I am working on, I was not able to catch up with the review comments. But I will work on this in the coming days. I think I have to split the review in 2 parts.

cbos avatar Jun 02 '25 13:06 cbos

While I was working on the Messaging Semantic conventions I found out that there are 2 additional problems. There are related to span links rather than parent-child relations.

Problem 1 - context propagation: image

Span links, only link one span to the other. But the context is not propagated. Especially baggage headers are mend for context propagation. But that is missing right now.

That is what I implemented in this PR as well image I created a class for that instrumentation-api/src/main/java/io/opentelemetry/instrumentation/api/internal/PropagatorBasedBaggageExtractor.java used that from the JMS implementation. @laurit you made a comment on this util class to make it a separate PR.

Problem 2 - no child spans: image

JMS has 2 ways to receive messages. One of these options (which is the most I think, especially in older software) causes that you don't have child spans. These 'child spans' do not have a context and will become standalone.

That is solved in this PR as well: image

I created a setting for that as well: otel.instrumentation.jms.experimental.consumer-process-telemetry.enabled, which is default disabled.


@laurit @zeitlinger What is your advise on how to continue with this? As far as I can see, only the new Messaging Semantic conventions is only half solution. The other 2 parts are also needed to have a full functional solution, otherwise you miss essential information or have crippled trace. Taking out the changes for these 2 parts will take time as well.

Locally I have an OpenTelemetry demo setup with some changes in which I included JMS messaging as well to test/demo all the changes of the new messaging conventions. Based on that I found the 2 problems as described above.

I can share that setup as well if you like.

cbos avatar Jun 04 '25 09:06 cbos

Span links, only link one span to the other. But the context is not propagated. Especially baggage headers are mend for context propagation. But that is missing right now.

@trask has this come up before in the spec? Imo you should definitely push this in a different PR. If this is not a speced behavior you'll either need to work to get this into the spec or add it as an experimental, disabled by default, feature.

JMS has 2 ways to receive messages. One of these options (which is the most I think, especially in older software) causes that you don't have child spans. These 'child spans' do not have a context and will become standalone.

To me this is a new feature not related to semconv update and thus should be put in a separate PR.

As far as I can see, only the new Messaging Semantic conventions is only half solution.

The thing with the new semantic conventions is that they can only be enabled with a new major release which means that they could remain behind an opt-in flag for quite a while. The other 2 features aren't really connected with the semconv update and could be implemented separately. Having separate features in separate PRs could also make the review easier.

laurit avatar Jun 05 '25 16:06 laurit

@laurit Thanks for your feedback, I will work on it to separate the two other parts.

cbos avatar Jun 05 '25 19:06 cbos

Span links, only link one span to the other. But the context is not propagated. Especially baggage headers are mend for context propagation. But that is missing right now.

I suspect this is expected, since baggage is typically associated with a single trace, and in the messaging modeling there are multiple traces involved.

I'd probably recommend opening an issue in semantic-conventions repo seeking any clarifications, since the messaging semconv is not simple (and it's not completely final).

trask avatar Jun 11 '25 01:06 trask

Span links, only link one span to the other. But the context is not propagated. Especially baggage headers are mend for context propagation. But that is missing right now.

I suspect this is expected, since baggage is typically associated with a single trace, and in the messaging modeling there are multiple traces involved.

I'd probably recommend opening an issue in semantic-conventions repo seeking any clarifications, since the messaging semconv is not simple (and it's not completely final).

@trask Thanks for your reply.

Based on what I read at https://github.com/open-telemetry/semantic-conventions/blob/main/docs/messaging/messaging-spans.md#context-propagation and with the old/current semantic conventions baggage propagation just normally works ends up in the consumer span.

I will take out the change from this PR and I will create a ticket on this repo. Then we can take it forward from that point. And if clarification is needed on semantics level, we can create a ticket for that as well.


Update: I created https://github.com/open-telemetry/opentelemetry-java-instrumentation/issues/14024 for this

cbos avatar Jun 12 '25 11:06 cbos

@trask / @laurit / @zeitlinger As I mentioned on #14024 I missed some important part in the semantic conventions. My impression was that with the new convention it 'always' should create span links INSTEAD of using the context as parent.

These conventions use spans links as the default mechanism to correlate producers and consumer(s) because:

  • It is the only consistent trace structure that can be guaranteed, given the many different messaging systems models available.

But somehow, I missed this part:

Message creation context as parent of "Process" span Exclusively for single messages scenarios, the "Process" span MAY use the message's creation context as its parent, thus achieving a direct parent-child relationship between producer and consumer(s). Instrumentations SHOULD document whether they use the message creation context as a parent for "Process" spans and MAY provide configuration options allowing users to control this behavior.

That has consequenses for this implementation, I will rework the implementation based on that.

  • So if the new semantic conventions are enabled it should always create the span links, but still can use the same as parent as well.
  • Context as parent is still possible (but not default with the new conventions I think).

Some questions for the implementation: Currently otel.instrumentation.messaging.experimental.receive-telemetry.enabled is a setting. If that is enabled, span links are created. (see https://opentelemetry.io/docs/zero-code/java/agent/instrumentation/#capturing-consumer-message-receive-telemetry-in-messaging-instrumentations) But according to the new specs span links are not about 'receive' but about 'process'. That would opt for a new setting with 'process' in the name. Should the behavior 'single message processing' be configurable per instrumentation implementation, so separately for JMS and RabbitMQ and Kafka, or a generic setting were it gets enabled/disabled for all types? Mostlikely an application does not use all these messaging techniques at the same time.

The receive-telemetry setting is false by default. Should the default be 'true' with the new semantic conventions?

cbos avatar Jun 19 '25 11:06 cbos

I think we may not need a new configuration property if we can put all the new behavior behind otel.semconv-stability.opt-in=messaging.

Exclusively for single messages scenarios, the "Process" span MAY use the message's creation context as its parent, thus achieving a direct parent-child relationship between producer and consumer(s)

I support Java instrumentation choosing to implement this MAY. We can add opt-out configuration property in the future if someone requests it.

trask avatar Jun 19 '25 20:06 trask

I think we may not need a new configuration property if we can put all the new behavior behind otel.semconv-stability.opt-in=messaging.

Exclusively for single messages scenarios, the "Process" span MAY use the message's creation context as its parent, thus achieving a direct parent-child relationship between producer and consumer(s)

I support Java instrumentation choosing to implement this MAY. We can add opt-out configuration property in the future if someone requests it.

@trask Thanks for your clarification. I checked the PR again, and as soon as otel.semconv-stability.opt-in=messaging or otel.semconv-stability.opt-in=dup/messaging the new names are used (and with dup also some old attributes are still available).

In both cases the emitStableMessagingSemconv() is true and based on that span links are created. But that now 'only' creates span links, which means that with otel.semconv-stability.opt-in=dup/messaging` no parent-child spans are created anymore.

What it your expectation for the 'dup/messaging' situation?

I think that is only remaining point to address, all other parts are addressed as far as I can see.

@laurit, based on your earlier comments I replied, but based on that some questions are still open. Can you have a look at them so we can finish these review comments as well?

cbos avatar Jun 30 '25 08:06 cbos