opentelemetry-js-contrib icon indicating copy to clipboard operation
opentelemetry-js-contrib copied to clipboard

Discussion(instrumentation-aws-sdk): SQS receive according to semantic conventions

Open blumamir opened this issue 4 years ago • 5 comments

This issue is to document the implementation of aws SQS receive operation according to the current semantic conventions for messaging systems (Oct 2021). there is an active SIG working on messaging systems specification which will probably change the specification and how to handle these situations when it's not possible to accurately extract perfect context.

receiveMessage

  • Messaging Attributes are added by this instrumentation according to the spec.
  • Additional "processing spans" are created for each message received by the application.
    If an application invoked receiveMessage, and received a 10 messages batch, a single messaging.operation = receive span will be created for the receiveMessage operation, and 10 messaging.operation = process spans will be created, one for each message.
    Those processing spans are created by the library. This behavior is partially implemented, See discussion below.
  • Sets the inter process context correctly, so that additional spans created through the process will be linked to parent spans correctly.
    This behavior is partially implemented, See discussion below.
  • Extract trace context from SQS MessageAttributes, and set span's parent and links correctly according to the spec.

Processing Spans

According to OpenTelemetry specification (and to reasonable expectation for trace structure), user of this instrumentation library would expect to see one span for the operation of receiving messages batch from SQS, and then, for each message, a span with it's own sub-tree for the processing of this specific message.

For example, if a receiveMessages returned 2 messages:

  • msg1 resulting in storing something to a DB.
  • msg2 resulting in calling an external HTTP endpoint.

This will result in a creating a DB span that would be the child of msg1 process span, and an HTTP span that would be the child of msg2 process span (in opposed to mixing all those operations under the single receive span, or start a new trace for each of them).

Unfortunately, this is not so easy to implement in JS:

  1. The SDK is calling a single callback for the messages batch, and it's not straightforward to understand when each individual message processing starts and ends (and set the context correctly for cascading spans).
  2. If async/await is used, context can be lost when returning data from async functions, for example:
async function asyncRecv() {
  const data = await sqs.receiveMessage(recvParams).promise();
  // context of receiveMessage is set here
  return data;
}

async function poll() {
  const result = await asyncRecv();
  // context is lost when asyncRecv returns. following spans are created with root context.
  await Promise.all(
    result.Messages.map((message) => this.processMessage(message))
  );
}

Current implementation partially solves this issue by patching the map \ forEach \ Filter functions on the Messages array of receiveMessage result. This handles issues like the one above, but will not handle situations where the processing is done in other patterns (multiple map\forEach calls, index access to the array, other array operations, etc). This is currently an open issue in the instrumentation.

blumamir avatar Oct 19 '21 13:10 blumamir

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days.

github-actions[bot] avatar Dec 20 '21 06:12 github-actions[bot]

This issue was closed because it has been stale for 14 days with no activity.

github-actions[bot] avatar Jan 03 '22 06:01 github-actions[bot]

Hi @dyladan @blumamir does this issue still occur if we repeatedly poll using setTimeout?

I have an issue with the instrumentation where there are lot of nested spans. I am using forEach and inside it, using a setTimeout. Its causing the spans to be nested with one root span having 10000+ spans for receiving messages.

Quoting the above code, I wrote a similar code.

async function asyncRecv() {
  const data = await sqs.receiveMessage(recvParams).promise();
  return data;
}

async function poll() {
  const result = await asyncRecv();
  result.Messages.forEach((message, i) => {
    this.processMessage(message);
    setTimeout(() => {
      this._poll();
    }, this.pollInterval * i);
  });
}

aadharsh-rengarajan avatar Aug 05 '22 05:08 aadharsh-rengarajan

This comment https://github.com/open-telemetry/opentelemetry-js-contrib/issues/1477#issuecomment-1836903586 has a section "Discussion: do we want to support this?" which argues for dropping some of the special handling for "SQS ReceiveMessage" requests -- specifically dropping the attempts to automatically create "processing" spans when iterating over received messages. IIUC, the semantic conventions have since changed to not longer mention "processing" spans.

trentm avatar Feb 07 '24 17:02 trentm

I'm for dropping the process spans. In this case should the instrumentation just start a new receive span for every unique producer span context?

seemk avatar Feb 28 '24 18:02 seemk