decaton icon indicating copy to clipboard operation
decaton copied to clipboard

Change the measurement target of `decaton.processor.processed.time`

Open emthrm opened this issue 6 months ago • 1 comments

Question

As shown below, the metric decaton.processor.processed.time appears to have changed since v8. Was this intentional?

< v8

The metric measured the total time taken by the processors to process tasks, including the time spent on scheduling. https://github.com/line/decaton/blob/642a81d770e64549fae1dba3692d84cf42deaf25/processor/src/main/java/com/linecorp/decaton/processor/runtime/internal/ProcessorUnit.java#L71-L85

>= v8

It now appears to measure the time taken for the ThreadPoolExecutor#execute call with the argument () -> processTask(request) to finish. https://github.com/line/decaton/blob/15f4e38ca824ecef6d201c1274aa9ce6d6139ed6/processor/src/main/java/com/linecorp/decaton/processor/runtime/internal/ThreadPoolSubPartitions.java#L87-L99

https://github.com/line/decaton/blob/15f4e38ca824ecef6d201c1274aa9ce6d6139ed6/processor/src/main/java/com/linecorp/decaton/processor/runtime/internal/ProcessorUnit.java#L52

I believe this measures the task queuing time rather than the actual execution time of the command. Ref. https://github.com/openjdk/jdk/blob/jdk-21-ga/src/java.base/share/classes/java/util/concurrent/ThreadPoolExecutor.java#L1325-L1377

Possible Solution

If this change was unintentional, the metric should measure the execution time of processTask(request) directly. Although I do not have a definitive solution, one potential approach is to modify executor.execute(() -> processTask(request)); as follows:

executor.execute(() -> {
    Timer timer = Utils.timer();
    try {
        processTask(request);
    } finally {
        metrics.processorProcessedTime.record(timer.duration());
    }
});

However, since this change could also affect VirtualThreadSubPartitions, it requires careful consideration.

emthrm avatar Jun 23 '25 14:06 emthrm

Thank you for reporting.

Nice catching. Definitely this is unintentional. As you suggested, measuring the duration in ProcessorUnit might be good, however we don't want to expose metric on VirtualThreadSubPartitions because it would cause per-thread metric which is too high cardinality. (That's the reason why only ThreadPoolSubPartitions has thread utilization metrics)

So, having thread utilization metrics optinally in ProcessorUnit only on ThreadPoolSubPartitions might be good.

By the way, are you interested in sending a patch for this?

Thanks,

ocadaruma avatar Jun 24 '25 01:06 ocadaruma