druid icon indicating copy to clipboard operation
druid copied to clipboard

Virtual Thread Segment Processing Pool

Open jtuglu1 opened this issue 2 months ago • 7 comments

Description

Implement a virtual thread segment processing pool now that JDK 21 is supported. From my benchmarks, ≥ 30% of processing pool thread states are in WAITING state (blocked on I/O) on historical nodes and 80-90% of total individual segment processing time is spent waiting for available thread in the processing pool during spikes in QPS.

Virtual threads could benefit here as we can increase our processing concurrency without significant overhead.

Additionally, thoughts on having proper thread state accounting metrics? I'm thinking something like jvm/thread/count where dimensions are:

  • state: the state the thread is in
  • pool: the thread pool the thread belongs to

This will help with debugging performance issues.

Motivation

This will reduce query/wait/time per-segment.

jtuglu1 avatar Nov 10 '25 18:11 jtuglu1

cc @clintropolis @kfaraz @gianm

jtuglu1 avatar Nov 14 '25 06:11 jtuglu1

I think this is worth investigating - I would imagine for workloads/tunings that are more likely to be disk bound that this could potentially be effective, though with sufficient ‘free’ memory space lots of segments can be in memory in page cache since it is using mmap, so those workloads might benefit less, since there is quite a lot of actual work they are doing too. I would think this needs lots of measurement for sure. This is something that is on my radar, but I don't personally plan to look into it until after we have migrated to having dart as the default query engine, so that there is less to focus on measuring and fewer moving parts.

Another thing that is really important to think about - with it being a fixed size currently it is also controlling memory usage. This includes both heap usage and also access to the number of ‘processing’ buffers that are active since that is currently a non-blocking pool, so we would need to rework that to be blocking to control memory size i think. I haven't thought much beyond these implications. I think we would want to get to where query engines primarily use off heap memory (which we have been working towards) and the engines also to transition to more granular memory allocations so that usage can grow as required instead of setting maximum bounds immediately up front. If done well, this would allow better usage of virtual threads because of better resource usage.

Also, semi related, the virtual storage 'on demand' thread pool would surely benefit from being virtual threads, I have done some experimentation with this and while I haven't measured it yet, it is really easy to wire up (at least if you ignore maintaining compatibility for older java versions... though there are possible solutions for this I've also been playing with like multi-release jars). If it turns out virtual threads are good for the processing pool, then the on demand downloads could potentially just be done inline in the same virtual thread pool instead of being a separate pool dedicated for downloads, hard to say without experimentation.

clintropolis avatar Nov 14 '25 06:11 clintropolis

On the topic of experimentation, did you have any thoughts on this proposed jvm/thread/count metric? Currently the JvmThreadMonitor is a bit sparse and doesn't track this stuff well. This kind of stuff also doesn't show up in manual flamegraph/jstack trace since it's highly variable and extremely short-lived (ns/us time horizon).

My thought would be to create a way for all Druid thread pools to initialize threads in way where emitting counters per state as well as poolName would be possible

jtuglu1 avatar Nov 14 '25 07:11 jtuglu1

Also, semi related, the virtual storage 'on demand' thread pool would surely benefit from being virtual threads, I have done some experimentation with this and while I haven't measured it yet, it is really easy to wire up (at least if you ignore maintaining compatibility for older java versions...

On this note: even without the on-demand thread pool, would it be worth replacing the current segment loading historical boot pool as well with virtual threads?

jtuglu1 avatar Nov 18 '25 01:11 jtuglu1

Sounds reasonable to me in general. I do believe @clintropolis is right that we would want to have a memory model where memory is associated with the query rather than the processing thread. Otherwise each virtual thread would need a big chunk of memory, which limits the number of virtual threads we can make.

Dart/MSQ already works like this: the "processing buffers" are not a shared pool, but are actually per-query. (Unlike native queries, where the "processing buffers" are associated 1-1 with processing threads.) So it could be interesting to try out the virtual thread idea there.

gianm avatar Nov 22 '25 00:11 gianm

have a memory model where memory is associated with the query rather than the processing thread.

Makes sense.

jtuglu1 avatar Nov 22 '25 00:11 jtuglu1

@gianm seems similar to the ideas referenced in: https://github.com/apache/druid/pull/11691

jtuglu1 avatar Nov 22 '25 00:11 jtuglu1