cask
cask copied to clipboard
Virtual Thread support
Quick Summary: 230.51 / S (Worker Thread)
VS
1545.18 / S (Virtual Threads)
Motivation: Fix https://github.com/com-lihaoyi/cask/issues/90
I did some analyse on cask, and found it's using undertow for now. and the undertow itself is using Object lock as a lock in the http://www.jboss.org/xnio codebase, which we can do some contribution there to remove the monitor and migrating to ReentrantLock. Which should be better than waiting next Java LTS (maybe Java 25).
So, for now, I think we can just add the Virtual Thread support in cask, and then do some contribution upstream or migrating to another network layer maybe Netty?
The current changes is quite straightfoward.
Modifcation:
- Update the dependecies, Mill, libraries, Scala
- Because I'm using Windows 11, and https://github.com/com-lihaoyi/cask/pull/124 cause some issue, so I have to rename/delete it locally for now.
- I fixed some hint problem to make the code looks better before starting ,mainly inside
Main.scala, others can follow after this PR. - Add
VirtualThreadSupportwhich only works on the Runtime where Virtual Thread is available, otherwise, it returns null.. - Add
NewThreadPerTaskExecutorwhich will start a new Virtual Thread for every Command/Runnable, the Undertow needs this. - Add
VirtualThreadBlockingHandlerwhich works likeBlockingHandlerbut delegates toNewThreadPerTaskExecutorin5, instead of the connections' worker executor. - Add a
cask.virtualThread.enabledsystem property with expected valuetrue|false, and only using the Virtual Thread to runRoutesif and only if both the Runtime supported andcask.virtualThread.enabledis set totrue. I'm not exposing anexecutoror etc thing, because that's would not be the philosophy of cask, but hiding the runtime behavior behind ofcask.virtualThread.enabled, as I was not knowing cask is running with undertow. - Do the some testing with ab, I changed a code with
Thread.sleep(1000)to simulates.
@cask.get("/")
def hello(): String = {
Thread.sleep(1000) // Simulate a slow endpoint
"Hello World! Hello World! Hello World!"
}
Results:
For the first run, I think it works just as expected, I did some assertion about the thread type Thread.isVirtual too during the testing.
Quick Summary: 230.51 / S (Worker Thread) VS 1545.18 / S (Virtual Threads)
My Laptop : i9-14900hx + 64G RAM
Result with Connection Worker Thread Java 21:
C:\Users\hepin>ab -n 10000 -c 10000 "http://localhost:8080/"
This is ApacheBench, Version 2.3 <$Revision: 1913912 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking localhost (be patient)
Completed 1000 requests
Completed 2000 requests
Completed 3000 requests
Completed 4000 requests
Completed 5000 requests
Completed 6000 requests
Completed 7000 requests
Completed 8000 requests
Completed 9000 requests
Completed 10000 requests
Finished 10000 requests
Server Software:
Server Hostname: localhost
Server Port: 8080
Document Path: /
Document Length: 38 bytes
Concurrency Level: 10000
Time taken for tests: 43.381 seconds
Complete requests: 10000
Failed requests: 0
Total transferred: 1740000 bytes
HTML transferred: 380000 bytes
Requests per second: 230.51 [#/sec] (mean)
Time per request: 43381.358 [ms] (mean)
Time per request: 4.338 [ms] (mean, across all concurrent requests)
Transfer rate: 39.17 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 1.2 0 16
Processing: 1435 19619 10914.5 19634 38933
Waiting: 1011 19598 10928.7 19604 38933
Total: 1435 19619 10914.6 19634 38933
Percentage of the requests served within a certain time (ms)
50% 19634
66% 25433
75% 29280
80% 31199
90% 35040
95% 36965
98% 37949
99% 37957
100% 38933 (longest request)
VS
Result with Virtual Threads on Java 21
C:\Users\hepin>ab -n 10000 -c 10000 "http://localhost:8080/"
This is ApacheBench, Version 2.3 <$Revision: 1913912 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking localhost (be patient)
Completed 1000 requests
Completed 2000 requests
Completed 3000 requests
Completed 4000 requests
Completed 5000 requests
Completed 6000 requests
Completed 7000 requests
Completed 8000 requests
Completed 9000 requests
Completed 10000 requests
Finished 10000 requests
Server Software:
Server Hostname: localhost
Server Port: 8080
Document Path: /
Document Length: 38 bytes
Concurrency Level: 10000
Time taken for tests: 6.472 seconds
Complete requests: 10000
Failed requests: 0
Total transferred: 1740000 bytes
HTML transferred: 380000 bytes
Requests per second: 1545.18 [#/sec] (mean)
Time per request: 6471.734 [ms] (mean)
Time per request: 0.647 [ms] (mean, across all concurrent requests)
Transfer rate: 262.56 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 1.2 0 16
Processing: 1576 2120 202.6 2175 2527
Waiting: 1005 1631 364.6 1517 2279
Total: 1576 2120 202.6 2175 2527
Percentage of the requests served within a certain time (ms)
50% 2175
66% 2255
75% 2306
80% 2311
90% 2327
95% 2337
98% 2341
99% 2342
100% 2527 (longest request)
Future Actions:
- Add testing about the
NewThreadPerTaskExecutorand testing theRoutesis running inside a VT when all setup. - Add benchmarks with gathing/ab or doing it with Github Action (nightly build)
- Add Github Actions for Java 21 and Virtual thread.
- Adjust the code with feedback.
Left some comments on the code. Thanks for looking into this!
Some high-level things to do:
-
Include virtual thread configuration in a dedicated example in the
examples/folder, and then similarly include it in the doc-site. That would help ensure that it remains discoverable for people who want to try it out, which I expect will be an increasing number as Java 21+ is adopted. -
Embed your benchmark logic into the build system somehow, so other people can reproduce it. You can download and cache the
abbinary viarequests.get/os.writein Mill and then define a smallScalaModuleto spin up Mill and run the benchmark with and without virtual threads. This will easily let people reproduce the benchmark or make adjustments to it as necessary -
Once you've got your benchmarking logic setup, run the benchmarks with/without virtual threads on a few of our other examples: perhaps on
minimalApplicationas a tiny example, andtodoas a big example, andstaticFilesas a non-compute bound example. We expect the benefits to be not as dramatic as with your synthetic example usingThread.sleep, but it is still good to know whether there is any improvement, no improvement, or even regression. That would help us advise users what kind of use cases would benefit from using virtual threads and which ones won't -
How much of the improvement is due to virtual threads, and how much is due to the
NewThreadPerTaskExecutorwe began using? We should run a benchmark withNewThreadPerTaskExecutorenabled but virtual threads disabled to verify that the virtual threads are the benefit, and not just the swapping out of theExecutorstrategy. Does the default one also spawn aNewThreadPerTask, or is it using a fixed thread pool of some sort?
Great, thanks for the quic and detailed review .
I will update this after work, we do see 10~15pt performance improvement and 5pt RAM reduce at internal stress testing, that numbers vary due to different workloads, but in most case,VT is a good choice ,and Nima is using virtual thread by default now.
Update: As @lihaoyi once said , we need comparing the performance with and witout virtual thread, so I have to screen the executor, I tested it locally, it works well.
Even the backing scheduler of Virtual thread is not the default ForkJoinPool, it still works.
VirtualThread[#91,cask-handler-executor-virtual-thread-0]/runnable@pool-1-thread-1
VirtualThread[#93,cask-handler-executor-virtual-thread-1]/runnable@pool-1-thread-2
With this setup, I think I can continue other part of this PR.
object Compress extends cask.MainRoutes {
protected override val handlerExecutor: Executor = Executors.newFixedThreadPool(Runtime.getRuntime.availableProcessors())
@cask.decorators.compress
@cask.get("/")
def hello(): String = {
println(Thread.currentThread())
"Hello World! Hello World! Hello World!"
}
initialize()
}
@He-Pin let me know when this is ready for review, looking forward to using virtual threads in cask!
@lihaoyi I will continue it this weekend, but currently, I need a PR [https://github.com/com-lihaoyi/cask/pull/124] get merged, because I'm using Windows 11, and I don't have a mac/linux for now.
@He-Pin merged it
Bump on this @He-Pin :)
@lihaoyi Sorry for being late, We encountered a virtual thread deadlocking at work, related to the classloader. When the virtual thread and the other platform thread(not the carrier thread) both try to load the same classes or a virtual thread is been notified but there is no carrier thread(been pinned) to run the virtual thread, then deadlock.
So we just changed the classloader, both in the JVM cpp and the lib.
But, cask is a library, So the best we can do is add some documents about this case, and a simple way is limiting the Max concurrent Virtual threads number.
I'm sorry for not updating this soon, I was a little busy at work and we had this issue and fixed it with JVM modification.
So what do you think about :
- add some documentation about this.
- add some best practices about virtual threads.
@He-Pin I think documentation and best practices are fine. We do not have the power to change the JVM, so best we can do is tell people if they want to use JVM virtual threads with Cask what the best way of doing so is
Bump on this. @He-Pin if you're not able to pick this up, I'll put it up again in the next set of bounties
Sorry for the delay, I was working with our internal JVM team to workaround the synchronization in the Classloader, and another issue I currently encounter is the https://github.com/com-lihaoyi/mill/issues/3168 on Windows.
I will continue this this week, did spend sometime to learn the toolchian.
Because privateLookupIn starts in Java 9, and Cask currnetly support java8, so I have to fallback to Reflect I think.
Got it, if you're still on it I'll leave it to you then!
Feel free to bump the required JVM version of necessary. We dont need to support Java 8 forever, e.g. requests-scala already moved to Java 11. Java is already on 21 or 22, so even jumping to Java 17 would be reasonable
Closing due to inactivity