scala3 icon indicating copy to clipboard operation
scala3 copied to clipboard

New lazy vals implementation

Open szymon-rd opened this issue 3 years ago • 37 comments

Cleaned up version of https://github.com/lampepfl/dotty/pull/14545

Implementing the new lazy vals scheme mentioned in https://github.com/lampepfl/dotty/pull/6979, fixing https://github.com/lampepfl/dotty/issues/7140

New PR to avoid force pushing to main branch of the previous author.

Fixes https://github.com/lampepfl/dotty/issues/15149 as well Applied https://github.com/lampepfl/dotty/pull/14780

szymon-rd avatar May 26 '22 12:05 szymon-rd

@odersky Created this as a new PR with source branch at dotty-staging so you can edit it.

szymon-rd avatar May 26 '22 12:05 szymon-rd

test performance please

prolativ avatar Jun 01 '22 11:06 prolativ

test performance please

bishabosha avatar Jun 01 '22 11:06 bishabosha

performance test scheduled: 7 job(s) in queue, 1 running.

dottybot avatar Jun 01 '22 11:06 dottybot

Performance test finished successfully:

Visit https://dotty-bench.epfl.ch/15296/ to see the changes.

Benchmarks is based on merging with main (89c3c109ea)

dottybot avatar Jun 01 '22 13:06 dottybot

test performance please

szymon-rd avatar Jun 01 '22 13:06 szymon-rd

performance test scheduled: 6 job(s) in queue, 1 running.

dottybot avatar Jun 01 '22 13:06 dottybot

A few years ago, I wrote micro-benchmarks for Dotty lazy val implementations: https://github.com/allanrenucci/dotty-benchmark/tree/lazy-vals-v2

You may be interested in running these benchmarks against the new implementation.

allanrenucci avatar Jun 01 '22 14:06 allanrenucci

@allanrenucci thanks, I will use it

szymon-rd avatar Jun 01 '22 14:06 szymon-rd

Performance test finished successfully:

Visit https://dotty-bench.epfl.ch/15296/ to see the changes.

Benchmarks is based on merging with main (3ddb21c687)

dottybot avatar Jun 01 '22 17:06 dottybot

test performance please

szymon-rd avatar Jun 06 '22 14:06 szymon-rd

performance test scheduled: 1 job(s) in queue, 1 running.

dottybot avatar Jun 06 '22 14:06 dottybot

Performance test finished successfully:

Visit https://dotty-bench.epfl.ch/15296/ to see the changes.

Benchmarks is based on merging with main (e71fe80ff4)

dottybot avatar Jun 06 '22 17:06 dottybot

@allanrenucc I think that some files may be missing on this branch - LazyVolatile and LazySimCellWithPublicBitmap are not there. Do you happen to know where to find them?

szymon-rd avatar Jun 08 '22 10:06 szymon-rd

I don't fully remember what these objects refer to. LazyVolatile looks very similar to scala.runtime.LazyVals. Not sure about LazySimCellWithPublicBitmap .

allanrenucci avatar Jun 08 '22 13:06 allanrenucci

Result of running benchmark on 3.1.2 vs this branch (average from jmh):

Benchmark Time - 3.1.2 Time - New
Contended Init - 5kk, 8 threads 939.305ms (+/- 23.7ms) 477.659ms (+/- 15.1ms)
Contended Init - 5kk, 4 threads 602.388ms (+/- 31.7ms) 348.173ms (+/- 7.2ms)
Contended Init - 5kk, 2 threads 423.829ms (+/- 23.2ms) 516.602ms (+/- 6.8ms)
Uncontended init 18.633ns (+/- 0.7ns) 21.124ns (+/- 0.9ns)
Initialized access 2.198ns (+/- 0.0ns) 2.774ns (+/- 0.1ns)

There is much better performance for contended multi-threaded access (for some cases 2 times faster, but depends on number of threads and problem), but the performance for single threaded access is slower (about 25%).

Tests from https://github.com/allanrenucci/dotty-benchmark/tree/lazy-vals-v2

szymon-rd avatar Jun 09 '22 13:06 szymon-rd

Maybe the initialised read case should come first. I suspect this is the most common case. I.e.

def x: A =
  while true do
    val current: AnyRef = _x
    if current.isInstanceOf[A] then
      return current.asInstanceOf[A]
    else ...

Another thing to consider is to provide a small accessor method that the JVM could easily inline. E.g.

def x: A = if current.isInstanceOf[A] then current.asInstanceOf[A] else computeX()

This was suggested by @sjrd in #5478.

allanrenucci avatar Jun 09 '22 18:06 allanrenucci

@sjrd Could you take a look again? I addressed everything I think. I will look a bit into the performance to see if I can change anything in single-threaded case.

szymon-rd avatar Jun 10 '22 13:06 szymon-rd

I think we need a decision whether the performance hit for the uncontended init case and the initialized access case is acceptable (or whether we can improve the performance). I am not so sure about this, in fact. I believe these cases matter the most, by far. The way I think about it is that initialized access would happen about an order of magnitude more often than uncontended init, and both are several orders of magnitude more common than the contended cases.

odersky avatar Jun 10 '22 14:06 odersky

I agree, I will investigate the performance.

szymon-rd avatar Jun 10 '22 14:06 szymon-rd

it would be interesting also to compare how lazy vals perform with a large number of the new virtual threads in Jdk 19

bishabosha avatar Jun 10 '22 14:06 bishabosha

Alternative flows that I mentioned off-line during the meeting:

  • Have Waiting, Evaluating and NullValue all extend a common superclass. Then type-test for that class first. Hopefully that will get the code more quickly to the default, fully-initialized code path.
  • If that's not enough, make all 3 the same final class. Evaluating and NullValue would then be two magical instances of Waiting that are not actually waiting.
  • Otherwise, as mentioned above, type-test for the type of the lazy val first (only when it's not a supertype of Waiting).

sjrd avatar Jun 13 '22 15:06 sjrd

Ok so after working on it a bit I think I will implement an algorithm like this:

  private var _x: AnyRef = null

  def x: A =
    if !(_x == null || _x.isInstanceOf[LazyValControlState]) then
      return _x.asInstanceOf[A]
    else if _x == NullValue then
      return null
    else 
      return x_compute()
  
  private def x_compute() = 
     while true do
       val current: AnyRef = _x
       if current == null then
         if CAS(_x, null, Evaluating) then
           var result: AnyRef = null
           try
             result = rhs
             return result
           finally
              if result == null then result = NullValue
              if !CAS(_x, Evaluating, result) then
                val lock = _x.asInstanceOf[Waiting]
                CAS(_x, lock, result)
                lock.release()
       else
         if current.isInstanceOf[LazyValControlState] then
          if CAS(current, Evaluating, new Waiting) then
            <EmptyTree>
          else if current.isInstanceOf[NullValue] then
            return null
          else if current.isInstanceOf[Waiting] then
            current.asInstanceOf[Waiting].await()
         else
           return current.asInstanceOf[A]
     end while

szymon-rd avatar Jun 23 '22 13:06 szymon-rd

I wonder if using a special value for Uninitialized would be more efficient? It removes some some branches. (this might have been suggested before, but I don't remember where)

 private var _x: AnyRef = Uninitialized // subclass of LazyValControlState

  def x: A =
    if !_x.isInstanceOf[LazyValControlState] then
      return _x.asInstanceOf[A]
    else 
      return x_compute()
  
  private def x_compute() = 
     while true do
       val current: AnyRef = _x
       if current == Uninitialized then
         if CAS(_x, Uninitialized, Evaluating) then
           var result: AnyRef = Uninitialized
           try
             result = rhs
             return result
           finally
              if !CAS(_x, Evaluating, result) then
                val lock = _x.asInstanceOf[Waiting]
                CAS(_x, lock, result)
                lock.release()
       else
         if current.isInstanceOf[LazyValControlState] then
          if CAS(current, Evaluating, new Waiting) then
            <EmptyTree>
          else if current.isInstanceOf[Waiting] then
            current.asInstanceOf[Waiting].await()
         else
           return current.asInstanceOf[A]
     end while

TheElectronWill avatar Jun 27 '22 07:06 TheElectronWill

The problem with that is that you have to actually initialize the field to Uninitialized in the constructor. The idea of using null is that we get it for free when creating the object. That said, it may be worth a try, of course.

sjrd avatar Jun 27 '22 07:06 sjrd

I was thinking about it, but it seemed a bit problematic because it introduces an additional state - null before Uninitialized. If the value is referenced in the constructor before initialization of lazy val, then we would have a problem, for example:

Right now in this snippet:

object A {
 val x = y
 lazy val y = "foo"
}

The x is equal to "foo". If we make implement the Uninitialized approach for lazy vals, and treat null as null value, we face two problems:

  • Firstly, the value of x would be equal to null, as in the constructor the val x = y would be called before assigning Uninitialized to the y. And the null in y would be interpreted as a null value.
  • Secondly, it poses a problem in e.g. integer values. If in the example above we had a lazy val y = 3, I believe we would get a runtime error for assigning null to val of type Int. I think we would need to handle these cases separately which would cause us to add additional if-statement to the getter anyway.

Nevertheless, I think that performance-wise it's cheaper to initialize the lazy val in constructor once than to have additional if-statement on every call. However, if we do that, we are still left with the problems above.

szymon-rd avatar Jun 27 '22 11:06 szymon-rd

You can "early"-initialize it, before calling the super constructor. Before that, no code can possibly refer to it.

sjrd avatar Jun 27 '22 11:06 sjrd

I am not sure if I understand. If I have a snippet like this:

object A {
 val x = y
 lazy val y = "foo"
}

Then I think it would be compiled to something like this:

object A {
  val x = null
  val _y = null

  <init> {
      x = y() // -> null
      _y = Unitialized
   }

   def y() = { //simplified
     if _y == null then
       return null
    else ...
  }
}

It seems that the only way to fix it is to move y initialization up in the constructor, but then it becomes weird when there are many lazy vals. I may be missing something important here.

Edit: Resolved, look at comment below.

szymon-rd avatar Jun 27 '22 11:06 szymon-rd

Oh, ok, now I see. If I put all the lazy = Uninitialized as first instructions in the constructor, then it will work.

szymon-rd avatar Jun 27 '22 11:06 szymon-rd

What would it take to leave the old implementation available under a Java system property?

Especially if we do this — but even if we don’t — I think this is a PR that would especially benefit from some good user-facing documentation — in one place, not scattered around multiple issues and PRs — of what the possible impacts on end users might be. (For example, https://github.com/lampepfl/dotty/pull/6979 has a nice pros-and-cons list, but I'm not sure how up-to-date it still is, and I had to poke around for a while to find it.)

SethTisue avatar Jul 07 '22 23:07 SethTisue