jetty.project
                                
                                 jetty.project copied to clipboard
                                
                                    jetty.project copied to clipboard
                            
                            
                            
                        Experiment with fully virtual VirtualThreadPool
Virtual threads are used for all threading.
This will also need #11504
If a selector thread is virtual, then it's pinned when it is waiting in select().
It is difficult to show with jstack, but using a mix of jstack and a call to the MBean HotspotDiagnostic.dumpThreads("/tmp/jvm.dump", JSON), and with system property -Djdk.trackAllThreads=true, you get, for example:
From jstack
"ForkJoinPool-1-worker-1" #31 [58094] daemon prio=5 os_prio=0 cpu=1.15ms elapsed=963.80s allocated=22128B defined_classes=5 tid=0x000074ee24d6f450  [0x000074edf482c000]
   Carrying virtual thread #30
	at jdk.internal.vm.Continuation.run([email protected]/Continuation.java:248)
	at java.lang.VirtualThread.runContinuation([email protected]/VirtualThread.java:221)
	at java.lang.VirtualThread$$Lambda/0x00000007c01ffa08.run([email protected]/Unknown Source)
	at java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec([email protected]/ForkJoinTask.java:1423)
	at java.util.concurrent.ForkJoinTask.doExec([email protected]/ForkJoinTask.java:387)
	at java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec([email protected]/ForkJoinPool.java:1312)
	at java.util.concurrent.ForkJoinPool.scan([email protected]/ForkJoinPool.java:1843)
	at java.util.concurrent.ForkJoinPool.runWorker([email protected]/ForkJoinPool.java:1808)
	at java.util.concurrent.ForkJoinWorkerThread.run([email protected]/ForkJoinWorkerThread.java:188)
From the /tmp/jvm.dump file:
         {
           "tid": "30",
           "name": "server0",
           "stack": [
              "java.base\/sun.nio.ch.EPoll.wait(Native Method)",
              "java.base\/sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:121)",
              "java.base\/sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:130)",
              "java.base\/sun.nio.ch.SelectorImpl.select(SelectorImpl.java:147)",
              "[email protected]\/org.eclipse.jetty.io.ManagedSelector.nioSelect(ManagedSelector.java:181)",
              "[email protected]\/org.eclipse.jetty.io.ManagedSelector.select(ManagedSelector.java:188)",
              "[email protected]\/org.eclipse.jetty.io.ManagedSelector$SelectorProducer.select(ManagedSelector.java:612)",
              "[email protected]\/org.eclipse.jetty.io.ManagedSelector$SelectorProducer.produce(ManagedSelector.java:546)",
              "[email protected]\/org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.produceTask(AdaptiveExecutionStrategy.java:512)",
              "[email protected]\/org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.tryProduce(AdaptiveExecutionStrategy.java:258)",
              "[email protected]\/org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.produce(AdaptiveExecutionStrategy.java:195)",
              "java.base\/java.util.concurrent.ThreadPerTaskExecutor$TaskRunner.run(ThreadPerTaskExecutor.java:314)",
              "java.base\/java.lang.VirtualThread.run(VirtualThread.java:309)"
           ]
         },
ForkJoinPool, which is used as the default scheduler for virtual threads, can be constructed using a "parallelism" and a "maxPoolSize".
By default, parallelism=#cores and maxPoolSize=256 (see VirtualThread.createDefaultScheduler() in the JDK sources).
This means that if we have 8 cores and 8 selectors, we would be completely pinned, if not that FJP can "overshoot" and create other platform threads, up to 256 total.
Setting -Djdk.virtualThreadScheduler.maxPoolSize=8 would limit the overshoot to 8, but if you have 8 virtual threads pinned in select() now the system is completely pinned.
The default idle timeout for the virtual thread FJP scheduler is 30 seconds, so they stay around for a while.
The option of only using virtual threads starts to feel in the camp of "magic", as in you think you know how the system works, but not really, until you know the details above.
We should test this change with the perf profiler but #11513 should be fixed first.
@sbordet @lorban because of pinning, I doubt this is a viable mode to run it.... but it would still be interesting to include this branch benchmarks just to see what a fully virtual server would run like. This would remove the overheads of both QTP and RTE, but with the risk of parallel slowdown. It might still be a fast (if somewhat risky) mode to run in? If we ever merge this, then we probably should modify the selector heuristics to never have more selectors than cores.
Considering pinning, I think the current virtual thread mode is safe because we force reserved threads to 0, thus we will never tryExecute a producer task, so the producer should never be virtual. We only use virtual threads to consume BLOCKING tasks, which should not be selectors.... but may be other code that pins the cores, but that should not be us at least.
@sbordet @lorban I reverted the name threadpool-hybrid to threadpool-virtual.  The new mode is now called threadpool-all-virtual.
@sbordet @lorban thoughts on this? Is this something we want to play with, or is pinning just too much of an issue to make it worthwhile?
@gregw keep it! 😄
Various comments:
- We tried to debug a virtual thread issue using what recommended by by JEP 444 in section Observing virtual threads. Unfortunately,jcmdoutputs a limited dump that does not report enough information (e.g. does not associate the virtual thread with its carrier, so there is no information if a virtual thread is pinned). TheTrackingVirtualExecutorin this PR will be able to do so, provided that dumping capabilities are added. I have just added this:
@Override
public void dump(Appendable out, String indent) throws IOException
{
    dumpObjects(out, indent, new DumpableCollection("virtual threads", _threads.stream().map(t -> new DumpableCollection(t.toString(), List.of(t.getStackTrace()))).toList()));
}
to get state & carrier information, and stack frames even of parked virtual threads.
This would be invaluable.
However, TrackingVirtualExecutor should be a reusable class that I can use when doing:
QueuedThreadPool qtp = new QueuedThreadPool();
qtp.setVirtualThreadExecutor(new TrackingVirtualExecutor(Executors.newThreadPerTaskExecutor()));
- I'd get rid of ThreadFactoryExecutorto use the established patter that if you need aThreadFactory, it is passed as a parameter.
- By default, carrier threads can overshoot up to 256, unless explicitly specified with system property jdk.virtualThreadScheduler.maxPoolSize. Therefore, I think there may be value to have aVirtualThreadPool(although the name is an anti-pattern 😄) as, by default, even if we pin virtual threads that doselect(), the JDK overshoots up to 256 so there should be room to spare for carriers, even when settingselectors=cores, which would pin all "default" virtual threads, but the JDK would still allocate more carriers (unless you have 256 cores).
- I'd rename the -all-with-full-.
@olamy the CI build is green. Any idea how I can make it update here?
@lorban @sbordet nudge!
@sbordet sorry I missed a few comments. I'm having troubling making the new intellij review mode update always. I think I have got them all now.