opentelemetry-js-contrib icon indicating copy to clipboard operation
opentelemetry-js-contrib copied to clipboard

feat: adding hook for adding span links to a lambda instrumentation span

Open peterabbott opened this issue 1 year ago • 9 comments

Which problem is this PR solving?

  • With the aws lambda instrumentation, when wrapping a lambda handler adds the ability to add span links. This is useful for batch processing where a message can come from different parent spans.

Short description of the changes

  • Allow links to be added to wrapped lambda spans

peterabbott avatar Feb 02 '24 08:02 peterabbott

CLA Signed

The committers listed above are authorized under a signed CLA.

  • :white_check_mark: login: peterabbott / name: Peter Abbott (b5b2d65b8f6e673bceec3797058304706c849188, 5a9f3cdca89cdd662586fc32e5c80c0be8f17c6d, c7c211b88079865f56f30e2a39928b807d8a9e6e)

@dyladan ping as I can see you did the last release of this module. Is there way to get eyes on a PR? Any reckons / comments here regarding adding links to an instrumented lambda root span?

peterabbott avatar Feb 11 '24 19:02 peterabbott

Codecov Report

Attention: Patch coverage is 50.00000% with 1 line in your changes missing coverage. Please review.

Project coverage is 91.01%. Comparing base (cf034c8) to head (c7c211b). Report is 243 commits behind head on main.

Files Patch % Lines
...-instrumentation-aws-lambda/src/instrumentation.ts 50.00% 1 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1922      +/-   ##
==========================================
- Coverage   91.02%   91.01%   -0.02%     
==========================================
  Files         146      146              
  Lines        7478     7480       +2     
  Branches     1497     1499       +2     
==========================================
+ Hits         6807     6808       +1     
- Misses        671      672       +1     
Files Coverage Δ
...-instrumentation-aws-lambda/src/instrumentation.ts 93.18% <50.00%> (-0.50%) :arrow_down:

codecov[bot] avatar Feb 22 '24 12:02 codecov[bot]

Sorry for the delay @peterabbott ! We are catching up on PRs. Generally this seems like a reasonable change, though we’re wondering if this is something generic that doesn't have to be configured multiple times by people with a callback, and instead can be implemented into the instrumentation for everyone to benefit? It may help if you provide the specific usage of this as an example of how it will be used. Also are there tests that can be added to verify this behavior?

As a note, we are also reaching out to get some input from AWS folks and unsure about their response to this as codeowners.

JamieDanielson avatar Jun 20 '24 16:06 JamieDanielson

@JamieDanielson thanks for the update.

The scenario we have is where a lambda uses an SQS event source mapping where the batch size > 1. The messages on the queue may have come from different parent traces. The current instrumentation looks at the Message for a parent trace id. The parent ID is saved on each message in the batch so when there is more than one unique parent trace id we lose the ability to link to a parent ID with respect to the current trace happening for the lambda invocation.

Using links was going to be our way to associate a message to a parent transaciton in these cases. Using the instrumentation approach is nice as it allows us to separate the code from tracing. Been a few issues with bugs in some of the contributed instrumentatino classes, with the isolated approach of Opentelemetry means it is easy to switch on/off.

On thought is to move the trace code to the handling of a individual message, which in the long run may be better, but we do miss a little bit of execution from when the message is handled and the splitting it out. There are some scenarios where we pre-processing message batches before the main execution so in those scenarios links become more useful.

Hope that makes sense?

I'll try to write some tests for what I am trying to describe

peterabbott avatar Jun 22 '24 04:06 peterabbott

@peterabbott can you please share the implementation of the hook that you plan to use in your lambda? is it something specific to your setup or is it a generic logic that is always expected to work? Since this behavior is part of semantic conventions, I wonder if we should always populate the links in the instrumentation for all users for SQS like aws-sdk instrumentation does, instead of asking to implement hooks as opt-in option.

blumamir avatar Jun 22 '24 05:06 blumamir

Using the AwsLambdaInstrumentation and the extractor hook we attempt to extract the propogated trace id from the SQS Records in the batch. This is fixed to SQS and wont apply to many calls. The snippets of relevant code

const sqsEventHandler = (records: SQSRecord[], awsEventContext: AwsContext) => {
  const parsedTraces = records
    .map((r: SQSRecord): Context | undefined => {
      return propagation.extract(ROOT_CONTEXT, r.messageAttributes, getter)
    })
    .reduce<Record<string, Context>>(
      (acc, ctx: Context | undefined) => {
        if (!ctx) {
          return acc
        }

        const spanCtx = trace.getSpanContext(ctx)
        if (!spanCtx) {
          return acc
        }

        // grouping by parent span trace id
        return {...acc, [spanCtx.traceId]: ctx}
      },
      {}
    )

  const parentTraces = Object.values(parsedTraces)

  // if only one event then we can do proper distributed tracing, otherwise a way to do links
  if (parentTraces?.length === 1) {
    diag.debug('single parent trace matched', {
      parentTraces
    })

    return parentTraces[0]
  }


  diag.debug('no or multiple parent traces', {parentTraces})
  // TODO: if > 1 add parent span links or using span link extractor suggestion from PR to add the link regardless 

  return ROOT_CONTEXT
}
///// --------
new AwsLambdaInstrumentation({
   eventContextExtractor: (event, awsEventContext): Context => {
        // Opinianted to SQS event handler
        if (Array.isArray(event?.Records)) {
          return sqsEventHandler(event.Records, awsEventContext)
        }
        // just return the context
        return ROOT_CONTEXT
      }
   }
 })

peterabbott avatar Jun 23 '24 02:06 peterabbott

This PR is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days.

github-actions[bot] avatar Aug 22 '24 06:08 github-actions[bot]

@blumamir wondering if there is any movement on this?

peterabbott avatar Sep 16 '24 22:09 peterabbott