spring-cloud-sleuth icon indicating copy to clipboard operation
spring-cloud-sleuth copied to clipboard

spring-cloud-sleuth trace stops working for scheduled jobs in the presence of a ScheduledExecutorService bean

Open ewirch opened this issue 4 years ago • 4 comments

Describe the bug

When the Spring context contains a ScheduledExecutorService bean, and when

  1. ScheduledTaskRegistrar is used to schedule recurring background jobs, or
  2. @Scheduled(cron) is used

each job will generate an individual trace id, and this very trace id will be used over and over again for each run of the job.

spring-cloud-sleuth:3.0.2

Sample

@Configuration
public class ManagedTaskConfiguration implements SchedulingConfigurer {
    @Override
    public void configureTasks(ScheduledTaskRegistrar taskRegistrar) {
        taskRegistrar.addCronTask(() -> {
            System.out.println(MDC.getCopyOfContextMap());
        }, "* * * * * *");
    }
}

The first trace id printed will be null, because the spring context is not ready yet, but then the job will stick to printing the same trace id.

See attached demo project. It contains both malfunctioning examples. demo.zip

Analysis

The problem is the order of runnable nesting. ScheduledTaskRegistrar.addCronTask() eventually calls org.springframework.scheduling.concurrent.ConcurrentTaskScheduler#schedule(java.lang.Runnable, org.springframework.scheduling.Trigger), which calls new ReschedulingRunnable(task, trigger, this.clock, this.scheduledExecutor, errorHandler).schedule();. ReschedulingRunnable passes itself to ScheduledExecutorService, which is a TraceableScheduledExecutorService when spring-cloud-sleuth is used. When this happens, the spring context is not ready yet, and ScheduledExecutorService directly schedules ReschedulingRunnable.

Time passes, and ReschedulingRunnable is executed. It runs its delegate, and, as the last step, re-schedules itself by calling ScheduledExecutorService.schedule() again. This time TraceableScheduledExecutorService wraps the ReschedulingRunnable in a TraceRunnable: TraceRunnable(ReschedulingRunnable(delegate)). The constructor of TraceRunnable is remembering the parent span parent = tracer.currentSpan();, which will be null at that point.

Time passes, and the TraceRunnable is executed now. Since parent is null a new trace is created. TraceRunnable invokes ReschedulingRunnable, which invokes it's delegate. When the delegate finished, ReschedulingRunnable re-schedules itself again. Since TraceRunnable had not a chance to clean up yet, there is still a live trace. When ReschedulingRunnable calls TraceableScheduledExecutorService.schedule() the constructor of TraceRunnable will pick up the current trace, and will save it as its parent trace. ReschedulingRunnable returns, TraceRunnable cleans up, but it is too late. There is already a TraceRunnable scheduled with the old trace.

From that point on re-scheduling will make TraceRunnable to always pick up the trace. The scheduled job will never generate a new trace id, because TraceRunnable calls ReschedulingRunnable, and not the other way round.

ewirch avatar Aug 27 '21 13:08 ewirch

Could you create a small sample that replicates this problem from start.spring.io and attach it to this issue / create a separate repo with it? That would be super useful!

marcingrzejszczak avatar Aug 30 '21 08:08 marcingrzejszczak

OK I have a workaround for you.

Add the property --spring.sleuth.async.enabled=false to disable automated instrumentation of executor services.

Manually instrument the runnable going to the cron task.

package com.example.demo;

import org.slf4j.MDC;

import org.springframework.cloud.sleuth.SpanNamer;
import org.springframework.cloud.sleuth.Tracer;
import org.springframework.cloud.sleuth.instrument.async.TraceRunnable;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.scheduling.annotation.SchedulingConfigurer;
import org.springframework.scheduling.concurrent.CustomizableThreadFactory;
import org.springframework.scheduling.config.ScheduledTaskRegistrar;

import java.util.concurrent.ScheduledExecutorService;
import java.util.concurrent.ScheduledThreadPoolExecutor;
import java.util.concurrent.ThreadFactory;

@Configuration
public class JobConfiguration implements SchedulingConfigurer {

    private final Tracer tracer;

    private final SpanNamer spanNamer;

    public JobConfiguration(Tracer tracer, SpanNamer spanNamer) {
        this.tracer = tracer;
        this.spanNamer = spanNamer;
    }

    @Override
    public void configureTasks(ScheduledTaskRegistrar taskRegistrar) {
        taskRegistrar.addCronTask(new TraceRunnable(tracer, spanNamer, () -> {
            System.out.println("SchedulingConfigurer: " + MDC.getCopyOfContextMap());
        }), "* * * * * *");
    }

    @Bean
    public ScheduledExecutorService scheduledExecutorService() {
        return new ScheduledThreadPoolExecutor(6, namedThreadFactory("scheduled-task-thread-%d"));
    }

    private ThreadFactory namedThreadFactory(String namePrefix) {
        return new CustomizableThreadFactory(namePrefix);
    }
}

Then the results are:

SchedulingConfigurer: {traceId=49cc3c88f5059601, spanId=49cc3c88f5059601}
@Scheduled {traceId=725fdb7c0b7d8b58, spanId=725fdb7c0b7d8b58}
SchedulingConfigurer: {traceId=815c8374090056f1, spanId=815c8374090056f1}
@Scheduled {traceId=bdc91ceb2e33dd24, spanId=bdc91ceb2e33dd24}
SchedulingConfigurer: {traceId=ceb5865154f38cb6, spanId=ceb5865154f38cb6}
@Scheduled {traceId=90251b2da421356c, spanId=90251b2da421356c}
SchedulingConfigurer: {traceId=b5433bb9c1e826c8, spanId=b5433bb9c1e826c8}
@Scheduled {traceId=279a02a77d443e33, spanId=279a02a77d443e33}
@Scheduled {traceId=a30c68aaa5e081ce, spanId=a30c68aaa5e081ce}
SchedulingConfigurer: {traceId=17975ef9448fe702, spanId=17975ef9448fe702}
SchedulingConfigurer: {traceId=c9b0d27a2b4e59de, spanId=c9b0d27a2b4e59de}
@Scheduled {traceId=07e20e2967e69535, spanId=07e20e2967e69535}

marcingrzejszczak avatar Sep 01 '21 13:09 marcingrzejszczak

We'll have to figure out the solution to this though.

marcingrzejszczak avatar Sep 01 '21 13:09 marcingrzejszczak

So it appears that two different things add traceId to my cron jobs:

  1. TraceableScheduledExecutorService (spring.sleuth.async.enabled)
  2. TraceSchedulingAspect (spring.sleuth.scheduled.enabled)

The aspect is working fine The executor wraps the runnable into a TraceRunnable that schedules the next trigger before it closes the span

I used spring.sleuth.async.ignored-beans to disable proxying of the Executor, this solved the problem in my case

mplain avatar Apr 19 '22 21:04 mplain

Please upgrade to Micrometer Tracing. Spring Cloud Sleuth is feature complete and out of OSS support.

marcingrzejszczak avatar Feb 09 '24 13:02 marcingrzejszczak