scala3
scala3 copied to clipboard
New lazy vals implementation
Cleaned up version of https://github.com/lampepfl/dotty/pull/14545
Implementing the new lazy vals scheme mentioned in https://github.com/lampepfl/dotty/pull/6979, fixing https://github.com/lampepfl/dotty/issues/7140
New PR to avoid force pushing to main branch of the previous author.
Fixes https://github.com/lampepfl/dotty/issues/15149 as well Applied https://github.com/lampepfl/dotty/pull/14780
@odersky Created this as a new PR with source branch at dotty-staging so you can edit it.
test performance please
test performance please
performance test scheduled: 7 job(s) in queue, 1 running.
Performance test finished successfully:
Visit https://dotty-bench.epfl.ch/15296/ to see the changes.
Benchmarks is based on merging with main (89c3c109ea)
test performance please
performance test scheduled: 6 job(s) in queue, 1 running.
A few years ago, I wrote micro-benchmarks for Dotty lazy val implementations: https://github.com/allanrenucci/dotty-benchmark/tree/lazy-vals-v2
You may be interested in running these benchmarks against the new implementation.
@allanrenucci thanks, I will use it
Performance test finished successfully:
Visit https://dotty-bench.epfl.ch/15296/ to see the changes.
Benchmarks is based on merging with main (3ddb21c687)
test performance please
performance test scheduled: 1 job(s) in queue, 1 running.
Performance test finished successfully:
Visit https://dotty-bench.epfl.ch/15296/ to see the changes.
Benchmarks is based on merging with main (e71fe80ff4)
@allanrenucc I think that some files may be missing on this branch - LazyVolatile and LazySimCellWithPublicBitmap are not there. Do you happen to know where to find them?
I don't fully remember what these objects refer to. LazyVolatile looks very similar to scala.runtime.LazyVals. Not sure about LazySimCellWithPublicBitmap .
Result of running benchmark on 3.1.2 vs this branch (average from jmh):
| Benchmark | Time - 3.1.2 | Time - New |
|---|---|---|
| Contended Init - 5kk, 8 threads | 939.305ms (+/- 23.7ms) | 477.659ms (+/- 15.1ms) |
| Contended Init - 5kk, 4 threads | 602.388ms (+/- 31.7ms) | 348.173ms (+/- 7.2ms) |
| Contended Init - 5kk, 2 threads | 423.829ms (+/- 23.2ms) | 516.602ms (+/- 6.8ms) |
| Uncontended init | 18.633ns (+/- 0.7ns) | 21.124ns (+/- 0.9ns) |
| Initialized access | 2.198ns (+/- 0.0ns) | 2.774ns (+/- 0.1ns) |
There is much better performance for contended multi-threaded access (for some cases 2 times faster, but depends on number of threads and problem), but the performance for single threaded access is slower (about 25%).
Tests from https://github.com/allanrenucci/dotty-benchmark/tree/lazy-vals-v2
Maybe the initialised read case should come first. I suspect this is the most common case. I.e.
def x: A =
while true do
val current: AnyRef = _x
if current.isInstanceOf[A] then
return current.asInstanceOf[A]
else ...
Another thing to consider is to provide a small accessor method that the JVM could easily inline. E.g.
def x: A = if current.isInstanceOf[A] then current.asInstanceOf[A] else computeX()
This was suggested by @sjrd in #5478.
@sjrd Could you take a look again? I addressed everything I think. I will look a bit into the performance to see if I can change anything in single-threaded case.
I think we need a decision whether the performance hit for the uncontended init case and the initialized access case is acceptable (or whether we can improve the performance). I am not so sure about this, in fact. I believe these cases matter the most, by far. The way I think about it is that initialized access would happen about an order of magnitude more often than uncontended init, and both are several orders of magnitude more common than the contended cases.
I agree, I will investigate the performance.
it would be interesting also to compare how lazy vals perform with a large number of the new virtual threads in Jdk 19
Alternative flows that I mentioned off-line during the meeting:
- Have
Waiting,EvaluatingandNullValueall extend a common superclass. Then type-test for that class first. Hopefully that will get the code more quickly to the default, fully-initialized code path. - If that's not enough, make all 3 the same final class.
EvaluatingandNullValuewould then be two magical instances ofWaitingthat are not actually waiting. - Otherwise, as mentioned above, type-test for the type of the lazy val first (only when it's not a supertype of
Waiting).
Ok so after working on it a bit I think I will implement an algorithm like this:
private var _x: AnyRef = null
def x: A =
if !(_x == null || _x.isInstanceOf[LazyValControlState]) then
return _x.asInstanceOf[A]
else if _x == NullValue then
return null
else
return x_compute()
private def x_compute() =
while true do
val current: AnyRef = _x
if current == null then
if CAS(_x, null, Evaluating) then
var result: AnyRef = null
try
result = rhs
return result
finally
if result == null then result = NullValue
if !CAS(_x, Evaluating, result) then
val lock = _x.asInstanceOf[Waiting]
CAS(_x, lock, result)
lock.release()
else
if current.isInstanceOf[LazyValControlState] then
if CAS(current, Evaluating, new Waiting) then
<EmptyTree>
else if current.isInstanceOf[NullValue] then
return null
else if current.isInstanceOf[Waiting] then
current.asInstanceOf[Waiting].await()
else
return current.asInstanceOf[A]
end while
I wonder if using a special value for Uninitialized would be more efficient? It removes some some branches.
(this might have been suggested before, but I don't remember where)
private var _x: AnyRef = Uninitialized // subclass of LazyValControlState
def x: A =
if !_x.isInstanceOf[LazyValControlState] then
return _x.asInstanceOf[A]
else
return x_compute()
private def x_compute() =
while true do
val current: AnyRef = _x
if current == Uninitialized then
if CAS(_x, Uninitialized, Evaluating) then
var result: AnyRef = Uninitialized
try
result = rhs
return result
finally
if !CAS(_x, Evaluating, result) then
val lock = _x.asInstanceOf[Waiting]
CAS(_x, lock, result)
lock.release()
else
if current.isInstanceOf[LazyValControlState] then
if CAS(current, Evaluating, new Waiting) then
<EmptyTree>
else if current.isInstanceOf[Waiting] then
current.asInstanceOf[Waiting].await()
else
return current.asInstanceOf[A]
end while
The problem with that is that you have to actually initialize the field to Uninitialized in the constructor. The idea of using null is that we get it for free when creating the object. That said, it may be worth a try, of course.
I was thinking about it, but it seemed a bit problematic because it introduces an additional state - null before Uninitialized. If the value is referenced in the constructor before initialization of lazy val, then we would have a problem, for example:
Right now in this snippet:
object A {
val x = y
lazy val y = "foo"
}
The x is equal to "foo". If we make implement the Uninitialized approach for lazy vals, and treat null as null value, we face two problems:
- Firstly, the value of
xwould be equal tonull, as in the constructor theval x = ywould be called before assigning Uninitialized to they. And thenullin y would be interpreted as a null value. - Secondly, it poses a problem in e.g. integer values. If in the example above we had a
lazy val y = 3, I believe we would get a runtime error for assigningnullto val of typeInt. I think we would need to handle these cases separately which would cause us to add additional if-statement to the getter anyway.
Nevertheless, I think that performance-wise it's cheaper to initialize the lazy val in constructor once than to have additional if-statement on every call. However, if we do that, we are still left with the problems above.
You can "early"-initialize it, before calling the super constructor. Before that, no code can possibly refer to it.
I am not sure if I understand. If I have a snippet like this:
object A {
val x = y
lazy val y = "foo"
}
Then I think it would be compiled to something like this:
object A {
val x = null
val _y = null
<init> {
x = y() // -> null
_y = Unitialized
}
def y() = { //simplified
if _y == null then
return null
else ...
}
}
It seems that the only way to fix it is to move y initialization up in the constructor, but then it becomes weird when there are many lazy vals. I may be missing something important here.
Edit: Resolved, look at comment below.
Oh, ok, now I see. If I put all the lazy = Uninitialized as first instructions in the constructor, then it will work.
What would it take to leave the old implementation available under a Java system property?
Especially if we do this — but even if we don’t — I think this is a PR that would especially benefit from some good user-facing documentation — in one place, not scattered around multiple issues and PRs — of what the possible impacts on end users might be. (For example, https://github.com/lampepfl/dotty/pull/6979 has a nice pros-and-cons list, but I'm not sure how up-to-date it still is, and I had to poke around for a while to find it.)