opentelemetry-ruby-contrib icon indicating copy to clipboard operation
opentelemetry-ruby-contrib copied to clipboard

sidekiq instrumentation: redis span is sibling to sidekiq instead of child

Open blumamir opened this issue 4 years ago • 6 comments

Description of the bug When instrumenting a sidekiq "perform_async" call, two spans are generated, one for sidekiq, and another for the underlying operation to redis (assuming that redis instrumentation is also installed). I expected the redis span to be child of sidekiq span, as it is executed by it inernally and in sidekiq context (line 3 in image is sibling to line 2 instead of child):

image

Share a simplified reproduction if possible


# config/initializers/opentelemetry.rb

require 'opentelemetry/sdk'
require 'opentelemetry/exporter/otlp'
require 'opentelemetry/instrumentation/all'

OpenTelemetry::SDK.configure do |c |
    c.service_name = 'ruby-monolith'
    c.use_all 'OpenTelemetry::Instrumentation::ActionPack' => { enable_recognize_route: true }

    c.add_span_processor(
    OpenTelemetry::SDK::Trace::Export::BatchSpanProcessor.new(
        OpenTelemetry::Exporter::OTLP::Exporter.new(endpoint: '***', headers: {
        'Authorization' => '***'
        })
      )
    )
end

# app/controllers/api/v1/sidekiq_controller.rb

class Api::V1::SidekiqController < ApplicationController

    # GET /sidekiq
    def index
        HardWorker.perform_async('bob', 5)
        render status: 200
    end

end

blumamir avatar Nov 01 '21 12:11 blumamir

Thanks for opening this issue and providing us with details.

Do you have a hypothesis as to why this is happening?

arielvalentin avatar Nov 01 '21 13:11 arielvalentin

Thanks for opening this issue and providing us with details.

Do you have a hypothesis as to why this is happening?

I am new to ruby so not sure how to approach and debug this issue. I assume that the context should be set using some mechanism here so the sidekiq code executed after the TracerMiddleware will have context with current span set to the sidekiq span, but don't know how this can be achieved.

blumamir avatar Nov 01 '21 13:11 blumamir

@blumamir Welcome to the Ruby Community!

I am not familiar enough with Sidekiq (or Redis low level API) to know how it works but I can say with certainty that there is an implicit parent context that is shared as a Fiber local variable so as long as the code executes within the same fiber an new child spans will be attached to the "current" span:

Tracer.in_span -> Tracer.start_span -> Trace.current_span

https://github.com/open-telemetry/opentelemetry-ruby/blob/main/api/lib/opentelemetry/trace/tracer.rb#L28 https://github.com/open-telemetry/opentelemetry-ruby/blob/main/sdk/lib/opentelemetry/sdk/trace/tracer.rb#L33

The redis instrumentation does the same:

https://github.com/open-telemetry/opentelemetry-ruby/blob/main/instrumentation/redis/lib/opentelemetry/instrumentation/redis/patches/client.rb#L45

Looking at the test suite for the sidekiq instrumentation, it looks like it tests that the spans are generated sequentially but does not check their hierarchy:

https://github.com/open-telemetry/opentelemetry-ruby/blob/d11cbd1ed101280d0c64d38709c16eab0e92fc9d/instrumentation/sidekiq/test/opentelemetry/instrumentation/sidekiq/patches/launcher_test.rb#L48

It is difficult to tell without knowing more about how Sidekiq uses Redis under the hood, e.g. does it indeed execute a redis call as part of the client pipeline or is it batching them and sending all requests for enqueued jobs together? If it is the latter case then the span output you see is correct.

The only thing I can see for certain is that it is using Redis pipelining https://redis.io/topics/pipelining

arielvalentin avatar Nov 01 '21 14:11 arielvalentin

Thanks @arielvalentin :) I appreciate the warm welcome and the time you put into pointing me to the relevant part of the code. Lot's of new info - I'll need to do some reading and digging to feel comfortable enough with this flow. In the meantime, maybe someone has the knowledge and time to investigate this low priority issue and propose a fix.

blumamir avatar Nov 01 '21 15:11 blumamir

I appreciate the warm welcome and the time you put into pointing me to the relevant part of the code. Lot's of new info - I'll need to do some reading and digging to feel comfortable enough with this flow. In the meantime, maybe someone has the knowledge and time to investigate this low priority issue and propose a fix.

We really could use the help so please take your time investigating.

arielvalentin avatar Nov 01 '21 18:11 arielvalentin

We have the same problem.

I think I have an idea what is happening here. Basically in sidekiq there's:

    def push(item)
      normed = normalize_item(item)
      payload = process_single(item['class'], normed)

      if payload
        raw_push([payload])
        payload['jid']
      end
    end

process_single method handles the middleware which creates the "send" span. But the Redis call is happening in the raw_push method. That's why the Redis call is not the child of the send span.

I think one option to solve this would be to patch the push method instead of using the client middleware, but I'm open to suggestions.

indrekj avatar Dec 21 '21 11:12 indrekj

👋 This issue has been marked as stale because it has been open with no activity. You can: comment on the issue or remove the stale label to hold stale off for a while, add the keep label to hold stale off permanently, or do nothing. If you do nothing this issue will be closed eventually by the stale bot.

github-actions[bot] avatar Apr 27 '23 01:04 github-actions[bot]