jetty.project icon indicating copy to clipboard operation
jetty.project copied to clipboard

Experiment with fully virtual VirtualThreadPool

Open gregw opened this issue 1 year ago • 13 comments

Virtual threads are used for all threading.

gregw avatar Mar 10 '24 11:03 gregw

This will also need #11504

gregw avatar Mar 11 '24 17:03 gregw

If a selector thread is virtual, then it's pinned when it is waiting in select().

It is difficult to show with jstack, but using a mix of jstack and a call to the MBean HotspotDiagnostic.dumpThreads("/tmp/jvm.dump", JSON), and with system property -Djdk.trackAllThreads=true, you get, for example:

From jstack

"ForkJoinPool-1-worker-1" #31 [58094] daemon prio=5 os_prio=0 cpu=1.15ms elapsed=963.80s allocated=22128B defined_classes=5 tid=0x000074ee24d6f450  [0x000074edf482c000]
   Carrying virtual thread #30
	at jdk.internal.vm.Continuation.run([email protected]/Continuation.java:248)
	at java.lang.VirtualThread.runContinuation([email protected]/VirtualThread.java:221)
	at java.lang.VirtualThread$$Lambda/0x00000007c01ffa08.run([email protected]/Unknown Source)
	at java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec([email protected]/ForkJoinTask.java:1423)
	at java.util.concurrent.ForkJoinTask.doExec([email protected]/ForkJoinTask.java:387)
	at java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec([email protected]/ForkJoinPool.java:1312)
	at java.util.concurrent.ForkJoinPool.scan([email protected]/ForkJoinPool.java:1843)
	at java.util.concurrent.ForkJoinPool.runWorker([email protected]/ForkJoinPool.java:1808)
	at java.util.concurrent.ForkJoinWorkerThread.run([email protected]/ForkJoinWorkerThread.java:188)

From the /tmp/jvm.dump file:

         {
           "tid": "30",
           "name": "server0",
           "stack": [
              "java.base\/sun.nio.ch.EPoll.wait(Native Method)",
              "java.base\/sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:121)",
              "java.base\/sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:130)",
              "java.base\/sun.nio.ch.SelectorImpl.select(SelectorImpl.java:147)",
              "[email protected]\/org.eclipse.jetty.io.ManagedSelector.nioSelect(ManagedSelector.java:181)",
              "[email protected]\/org.eclipse.jetty.io.ManagedSelector.select(ManagedSelector.java:188)",
              "[email protected]\/org.eclipse.jetty.io.ManagedSelector$SelectorProducer.select(ManagedSelector.java:612)",
              "[email protected]\/org.eclipse.jetty.io.ManagedSelector$SelectorProducer.produce(ManagedSelector.java:546)",
              "[email protected]\/org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.produceTask(AdaptiveExecutionStrategy.java:512)",
              "[email protected]\/org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.tryProduce(AdaptiveExecutionStrategy.java:258)",
              "[email protected]\/org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.produce(AdaptiveExecutionStrategy.java:195)",
              "java.base\/java.util.concurrent.ThreadPerTaskExecutor$TaskRunner.run(ThreadPerTaskExecutor.java:314)",
              "java.base\/java.lang.VirtualThread.run(VirtualThread.java:309)"
           ]
         },

sbordet avatar Mar 12 '24 16:03 sbordet

ForkJoinPool, which is used as the default scheduler for virtual threads, can be constructed using a "parallelism" and a "maxPoolSize".

By default, parallelism=#cores and maxPoolSize=256 (see VirtualThread.createDefaultScheduler() in the JDK sources).

This means that if we have 8 cores and 8 selectors, we would be completely pinned, if not that FJP can "overshoot" and create other platform threads, up to 256 total.

Setting -Djdk.virtualThreadScheduler.maxPoolSize=8 would limit the overshoot to 8, but if you have 8 virtual threads pinned in select() now the system is completely pinned.

The default idle timeout for the virtual thread FJP scheduler is 30 seconds, so they stay around for a while.

The option of only using virtual threads starts to feel in the camp of "magic", as in you think you know how the system works, but not really, until you know the details above.

sbordet avatar Mar 12 '24 16:03 sbordet

We should test this change with the perf profiler but #11513 should be fixed first.

lorban avatar Mar 13 '24 09:03 lorban

@sbordet @lorban because of pinning, I doubt this is a viable mode to run it.... but it would still be interesting to include this branch benchmarks just to see what a fully virtual server would run like. This would remove the overheads of both QTP and RTE, but with the risk of parallel slowdown. It might still be a fast (if somewhat risky) mode to run in? If we ever merge this, then we probably should modify the selector heuristics to never have more selectors than cores.

gregw avatar Mar 21 '24 05:03 gregw

Considering pinning, I think the current virtual thread mode is safe because we force reserved threads to 0, thus we will never tryExecute a producer task, so the producer should never be virtual. We only use virtual threads to consume BLOCKING tasks, which should not be selectors.... but may be other code that pins the cores, but that should not be us at least.

gregw avatar Mar 21 '24 05:03 gregw

@sbordet @lorban I reverted the name threadpool-hybrid to threadpool-virtual. The new mode is now called threadpool-all-virtual.

gregw avatar Mar 23 '24 11:03 gregw

@sbordet @lorban thoughts on this? Is this something we want to play with, or is pinning just too much of an issue to make it worthwhile?

gregw avatar Apr 29 '24 02:04 gregw

@gregw keep it! 😄

Various comments:

  1. We tried to debug a virtual thread issue using what recommended by by JEP 444 in section Observing virtual threads. Unfortunately, jcmd outputs a limited dump that does not report enough information (e.g. does not associate the virtual thread with its carrier, so there is no information if a virtual thread is pinned). The TrackingVirtualExecutor in this PR will be able to do so, provided that dumping capabilities are added. I have just added this:
@Override
public void dump(Appendable out, String indent) throws IOException
{
    dumpObjects(out, indent, new DumpableCollection("virtual threads", _threads.stream().map(t -> new DumpableCollection(t.toString(), List.of(t.getStackTrace()))).toList()));
}

to get state & carrier information, and stack frames even of parked virtual threads. This would be invaluable. However, TrackingVirtualExecutor should be a reusable class that I can use when doing:

QueuedThreadPool qtp = new QueuedThreadPool();
qtp.setVirtualThreadExecutor(new TrackingVirtualExecutor(Executors.newThreadPerTaskExecutor()));
  1. I'd get rid of ThreadFactoryExecutor to use the established patter that if you need a ThreadFactory, it is passed as a parameter.
  2. By default, carrier threads can overshoot up to 256, unless explicitly specified with system property jdk.virtualThreadScheduler.maxPoolSize. Therefore, I think there may be value to have a VirtualThreadPool (although the name is an anti-pattern 😄) as, by default, even if we pin virtual threads that do select(), the JDK overshoots up to 256 so there should be room to spare for carriers, even when setting selectors=cores, which would pin all "default" virtual threads, but the JDK would still allocate more carriers (unless you have 256 cores).
  3. I'd rename the -all- with -full-.

sbordet avatar Apr 29 '24 18:04 sbordet

@olamy the CI build is green. Any idea how I can make it update here?

gregw avatar May 02 '24 04:05 gregw

@olamy the CI build is green. Any idea how I can make it update here?

fixed.

olamy avatar May 03 '24 01:05 olamy

@lorban @sbordet nudge!

gregw avatar May 06 '24 02:05 gregw

@sbordet sorry I missed a few comments. I'm having troubling making the new intellij review mode update always. I think I have got them all now.

gregw avatar May 06 '24 10:05 gregw