bug icon indicating copy to clipboard operation
bug copied to clipboard

Unnecessary boxing match in destructuring tuple assignment

Open szeiger opened this issue 6 years ago • 21 comments

In 2.12 and 2.13-M3ish:

scala> class X { def foo = (1,2,3); def bar { val (x,y,z) = foo } }
defined class X

scala> :javap X
...

  public scala.Tuple3<java.lang.Object, java.lang.Object, java.lang.Object> foo();
...
  public void bar();
...
         0: aload_0
         1: invokevirtual #33                 // Method foo:()Lscala/Tuple3;
         4: astore_3
         5: aload_3
         6: ifnull        62
         9: aload_3
        10: invokevirtual #37                 // Method scala/Tuple3._1:()Ljava/lang/Object;
        13: invokestatic  #41                 // Method scala/runtime/BoxesRunTime.unboxToInt:(Ljava/lang/Object;)I
        16: istore        4
        18: aload_3
        19: invokevirtual #44                 // Method scala/Tuple3._2:()Ljava/lang/Object;
        22: invokestatic  #41                 // Method scala/runtime/BoxesRunTime.unboxToInt:(Ljava/lang/Object;)I
        25: istore        5
        27: aload_3
        28: invokevirtual #47                 // Method scala/Tuple3._3:()Ljava/lang/Object;
        31: invokestatic  #41                 // Method scala/runtime/BoxesRunTime.unboxToInt:(Ljava/lang/Object;)I
        34: istore        6
        36: new           #17                 // class scala/Tuple3
        39: dup
        40: iload         4
        42: invokestatic  #23                 // Method scala/runtime/BoxesRunTime.boxToInteger:(I)Ljava/lang/Integer;
        45: iload         5
        47: invokestatic  #23                 // Method scala/runtime/BoxesRunTime.boxToInteger:(I)Ljava/lang/Integer;
        50: iload         6
        52: invokestatic  #23                 // Method scala/runtime/BoxesRunTime.boxToInteger:(I)Ljava/lang/Integer;
        55: invokespecial #27                 // Method scala/Tuple3."<init>":(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)V
        58: astore_1
        59: goto          74
        62: goto          65
        65: new           #49                 // class scala/MatchError
        68: dup
        69: aload_3
        70: invokespecial #52                 // Method scala/MatchError."<init>":(Ljava/lang/Object;)V
        73: athrow
        74: aload_1
        75: astore_2
        76: aload_2
        77: invokevirtual #37                 // Method scala/Tuple3._1:()Ljava/lang/Object;
        80: invokestatic  #41                 // Method scala/runtime/BoxesRunTime.unboxToInt:(Ljava/lang/Object;)I
        83: istore        7
        85: aload_2
        86: invokevirtual #44                 // Method scala/Tuple3._2:()Ljava/lang/Object;
        89: invokestatic  #41                 // Method scala/runtime/BoxesRunTime.unboxToInt:(Ljava/lang/Object;)I
        92: istore        8
        94: aload_2
        95: invokevirtual #47                 // Method scala/Tuple3._3:()Ljava/lang/Object;
        98: invokestatic  #41                 // Method scala/runtime/BoxesRunTime.unboxToInt:(Ljava/lang/Object;)I
       101: istore        9
       103: return

Boxing jokes aside the problem is clearer when using values of type AnyRef to simplify the bytecode (no boxing, unboxing or checkcast):

         0: aload_0
         1: invokevirtual #29                 // Method foo:()Lscala/Tuple3;
         4: astore_3
         5: aload_3
         6: ifnull        44
         9: aload_3
        10: invokevirtual #33                 // Method scala/Tuple3._1:()Ljava/lang/Object;
        13: astore        4
        15: aload_3
        16: invokevirtual #36                 // Method scala/Tuple3._2:()Ljava/lang/Object;
        19: astore        5
        21: aload_3
        22: invokevirtual #39                 // Method scala/Tuple3._3:()Ljava/lang/Object;
        25: astore        6
        27: new           #17                 // class scala/Tuple3
        30: dup
        31: aload         4
        33: aload         5
        35: aload         6
        37: invokespecial #24                 // Method scala/Tuple3."<init>":(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)V
        40: astore_1
        41: goto          56
        44: goto          47
        47: new           #41                 // class scala/MatchError
        50: dup
        51: aload_3
        52: invokespecial #44                 // Method scala/MatchError."<init>":(Ljava/lang/Object;)V
        55: athrow
        56: aload_1
        57: astore_2
        58: aload_2
        59: invokevirtual #33                 // Method scala/Tuple3._1:()Ljava/lang/Object;
        62: astore        7
        64: aload_2
        65: invokevirtual #36                 // Method scala/Tuple3._2:()Ljava/lang/Object;
        68: astore        8
        70: aload_2
        71: invokevirtual #39                 // Method scala/Tuple3._3:()Ljava/lang/Object;
        74: astore        9
        76: return

Based on my rusty JVM assembly skills, this translates to:

val t1 = foo
val t2 = if(t1 eq null) throw new MatchError
else {
  val x = t1._1
  val y = t1._2
  val z = t2._3
  new Tuple3(x, y, z)
}
val x = t2._1
val y = t2._2
val z = t2._3

szeiger avatar Mar 09 '18 14:03 szeiger

Under 2.12.4, with opt:l:method:

  public void bar();
    descriptor: ()V
    flags: ACC_PUBLIC
    Code:
      stack=3, locals=2, args_size=1
         0: aload_0
         1: invokevirtual #28                 // Method foo:()Lscala/Tuple3;
         4: astore_1
         5: aload_1
         6: ifnull        25
         9: aload_1
        10: invokevirtual #32                 // Method scala/Tuple3._1:()Ljava/lang/Object;
        13: pop
        14: aload_1
        15: invokevirtual #35                 // Method scala/Tuple3._2:()Ljava/lang/Object;
        18: pop
        19: aload_1
        20: invokevirtual #38                 // Method scala/Tuple3._3:()Ljava/lang/Object;
        23: pop
        24: return
        25: new           #40                 // class scala/MatchError
        28: dup
        29: aload_1
        30: invokespecial #43                 // Method scala/MatchError."<init>":(Ljava/lang/Object;)V
        33: athrow
      LocalVariableTable:
        Start  Length  Slot  Name   Signature
            0      34     0  this   LX;
      LineNumberTable:
        line 1: 0
      StackMapTable: number_of_entries = 1
        frame_type = 252 /* append */
          offset_delta = 25
          locals = [ class scala/Tuple3 ]

which is more like

val tmp_1 = foo
if (tmp_1 eq null)
 throw new MatchError(tmp_1) 
else {
  foo._1
  foo._2
  foo._3
}

as I imagine is what you expected.

Does the strawman not use the optimizer?

hrhino avatar Mar 09 '18 15:03 hrhino

Yeah, -opt:l:method can handle this. Convenience tip: use cfr-decompiler (http://www.benf.org/other/cfr/)

 sandbox git:(travis-trigger-dist) scalac Test.scala -opt:l:method
 sandbox git:(travis-trigger-dist) cfr-decompiler X.class
...
    public void bar() {
        Tuple3<Object, Object, Object> tuple3 = this.foo();
        if (tuple3 != null) {
            tuple3._1();
            tuple3._2();
            tuple3._3();
            return;
        }
        throw new MatchError(tuple3);
    }
...

lrytz avatar Mar 09 '18 15:03 lrytz

Is there a good reason why that extra trip through Tuple3 is there without the extra optimization?

Jasper-M avatar Mar 09 '18 15:03 Jasper-M

I would expect this kind of code never to come out of patmat rather than relying on bytecode optimizations.

szeiger avatar Mar 09 '18 15:03 szeiger

I would expect this kind of code never to come out of patmat rather than relying on bytecode optimizations.

Once an optimizer can handle this, it can also handle the same code when it arises elsewhere (say from inlining), and then you can simplify your translations by relying on later optimizations, rather than duplicating similar logic (or more complicated special cases).

Blaisorblade avatar Mar 09 '18 21:03 Blaisorblade

@lrytz and @Blaisorblade - The optimizer (at least opt:l:method) should be on by default, then.

That the compiler ever produces code like this by default should be considered a bug. How, internally, it is avoided is less important than it not appearing in bytecode unless requested.

Note that case classes have the same problem.

Ichoran avatar Mar 09 '18 22:03 Ichoran

Addendum: if you run a trivial benchmark like comparing n and m in

class K(s: String) {
  val tup = (s.take(1), s, s.drop(2))
  def n = { 
    val (a, b, c) = tup
    a.length + b.length + c.length
  }
  def m = { 
    val t = tup
    t._1.length + t._2.length + t._3.length
  }
}

you find that the latter runs ~18% faster. The JIT compiler cannot (completely) fix this.

Ichoran avatar Mar 09 '18 22:03 Ichoran

I totally agree that performance-conscious people should default to using the optimizer for production builds. And thanks for doing the work and showing the JIT can't remove this overhead entirely—I've believed this for long (mostly based on comments by Paul) but I never had something to cite.

And look at my nice headline: https://twitter.com/Blaisorblade/status/972258678337359872 (any needed corrections welcome); I didn't wait to see the benchmark and check you got it right because you know better than me.

The optimizer (at least opt:l:method) should be on by default, then.

If that argument worked, C compilers should default to -O1 because of the garbage assembly they produce at -O0. What's true is that users default to optimizations on in C but do not in Scala. Also, C optimizers cause significant slowdowns, dunno if this is a concern for Scalac nowadays or with planned optimizers.

Blaisorblade avatar Mar 09 '18 23:03 Blaisorblade

I think that most users are not aware of the optimizer options. Documentation about the compiler options in general is not as easy to find as it could be IMO, perhaps that deserves a separate issue.

DavidGregory084 avatar Mar 10 '18 01:03 DavidGregory084

@Blaisorblade - Historically, -optimise has done nothing, so people are not used to using it for Scala (nor is anyone from Java!), and if the compiler is going to produce slow code unless you optimize, there needs to be a major outreach effort informing people of such; and, -O2 is immensely easier to type and listed everywhere in examples of C/C++, whereas -opt:l:method is hard to type and almost impossible to find.

So this isn't a very compelling argument. Adding -O to the scala compiler would probably be okay (given appropriate announcements).

Ichoran avatar Mar 10 '18 01:03 Ichoran

From a tooling perspective, it would be nice if build tools like SBT would automatically enable or disable some default level of optimisation depending on whether it's a release or snapshot build (similar to what Cargo does for Rust builds)

NthPortal avatar Mar 10 '18 02:03 NthPortal

There's lots of actionable and constructive ideas; some could be ready for issues or documentation PRs. I suspect at this point a discussion on contributors might be best, though having defaults in build tools seems closer to usual practice.

@NthPortal That makes sense to me, build tools have more information. In fact, it seems SBT has enough info to enable interprocedural optimizations across a library when releasing it. And without interprocedural optimizations you probably get similarly bad performance, just split across method calls.

@Ichoran I just meant "your particular argument proves too much, so something's wrong". And yeah, I'm aware of the cultural differences. But Scalac can't enable interprocedural optimizations (on which code) and slower compile times by default would also be a regression, so for now build tools sounds better.

Blaisorblade avatar Mar 10 '18 12:03 Blaisorblade

@Blaisorblade - My argument is that the compiler shouldn't locally produce inefficient code unless an optimizer is always on to clean it up. There are plenty of complex cases where it's reasonable to generate the simpler and less efficient code unless someone asks for the more sophisticated analysis to try to improve it. My argument wasn't that all possible analysis should be done by default.

Here, unnecessarily complex code is being produced to begin with. If it's slow to generate bad code and clean it up, then the compiler shouldn't generate bad code to begin with. This goes for for comprehensions as well (the desugaring of = is unfortunate). I can understand if everyone is too busy to fix it right now, but I can't understand this being "not a bug". Either it's a compile speed bug, where you're requiring extra work for no reason; or it's a code performance bug, where either the generation should be improved or the optimizer should get to have a go at it.

Ichoran avatar Mar 11 '18 01:03 Ichoran

Dup of #8790. And discussed on Gitter around https://gitter.im/scala/contributors?at=5aa4a5fa458cbde5571d9524.

Blaisorblade avatar Mar 11 '18 03:03 Blaisorblade

Reading this makes me sad. I am certainly not an expert in this area but here are a few thoughts

Optimisation is part of the back end genBcode isn't it This means that it is specific to the particular back end (bytecode js native etc) doesn't this mean that any optimization end up being duplicated or not in each of these generators

I never understood why all optimisation is back end specific. Maybe it's a good time to address this

Method level optimisation can run in parallel now but but generation of a more efficient free in the first place would make all subsequent phases faster

Running different build flags for test and prod code is not really an option in an agile environment IMO and we already burn many thousands of hours of cpu every day in compile and CI

I thing that the compiler should generate idiomatic code that should be reasonably efficient and optimization should be icing on the cake

mkeskells avatar Mar 11 '18 16:03 mkeskells

Many optimizations are backend-specific. You don't optimize the same way for the JVM or for JS or for LLVM. So in general, it is good for the performance on all platforms that each one has its own optimizer. In addition, the ones that are common are typically easy to implement, so the duplication is not a big deal.

That said, I've always thought the pattern matcher produced pretty bad code by default, and I'm not sure we wouldn't be able to produce better code right from the start.

sjrd avatar Mar 11 '18 18:03 sjrd

Documentation about the compiler options in general is not as easy to find as it could be IMO, perhaps that deserves a separate issue

issue: https://github.com/scala/docs.scala-lang/issues/1058 brand new PR: https://github.com/scala/docs.scala-lang/pull/1063

SethTisue avatar May 02 '18 16:05 SethTisue

/cc @mkeskells @rorygraves I believe this to be the ticket of reference for the issue we were discussing. We can take another look at this to see if we can get rid of the problem in val pattern desguraring, rather than relying on opt:l:method being used.

retronym avatar Aug 23 '18 10:08 retronym

Cool. I had forgotten about this ticket.

rorygraves avatar Aug 23 '18 10:08 rorygraves

The Gitter convo also points to the latest efforts in scala/scala#4376.

Blaisorblade avatar Aug 23 '18 13:08 Blaisorblade

current repro

➜  snips scala -J--enable-preview -J--add-exports -Jjdk.jdeps/com.sun.tools.javap=ALL-UNNAMED -nobootcp -Xlint
Welcome to Scala 2.13.13 (OpenJDK 64-Bit Server VM, Java 21.0.2).
Type in expressions for evaluation. Or try :help.

scala> class X { def foo = (1,2,3); def bar(): Unit = { val (x,y,z) = foo } }
class X

scala> :javap X#bar
  public void bar();

also

-opt:local

som-snytt avatar Apr 10 '24 04:04 som-snytt