bug
bug copied to clipboard
Unnecessary boxing match in destructuring tuple assignment
In 2.12 and 2.13-M3ish:
scala> class X { def foo = (1,2,3); def bar { val (x,y,z) = foo } }
defined class X
scala> :javap X
...
public scala.Tuple3<java.lang.Object, java.lang.Object, java.lang.Object> foo();
...
public void bar();
...
0: aload_0
1: invokevirtual #33 // Method foo:()Lscala/Tuple3;
4: astore_3
5: aload_3
6: ifnull 62
9: aload_3
10: invokevirtual #37 // Method scala/Tuple3._1:()Ljava/lang/Object;
13: invokestatic #41 // Method scala/runtime/BoxesRunTime.unboxToInt:(Ljava/lang/Object;)I
16: istore 4
18: aload_3
19: invokevirtual #44 // Method scala/Tuple3._2:()Ljava/lang/Object;
22: invokestatic #41 // Method scala/runtime/BoxesRunTime.unboxToInt:(Ljava/lang/Object;)I
25: istore 5
27: aload_3
28: invokevirtual #47 // Method scala/Tuple3._3:()Ljava/lang/Object;
31: invokestatic #41 // Method scala/runtime/BoxesRunTime.unboxToInt:(Ljava/lang/Object;)I
34: istore 6
36: new #17 // class scala/Tuple3
39: dup
40: iload 4
42: invokestatic #23 // Method scala/runtime/BoxesRunTime.boxToInteger:(I)Ljava/lang/Integer;
45: iload 5
47: invokestatic #23 // Method scala/runtime/BoxesRunTime.boxToInteger:(I)Ljava/lang/Integer;
50: iload 6
52: invokestatic #23 // Method scala/runtime/BoxesRunTime.boxToInteger:(I)Ljava/lang/Integer;
55: invokespecial #27 // Method scala/Tuple3."<init>":(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)V
58: astore_1
59: goto 74
62: goto 65
65: new #49 // class scala/MatchError
68: dup
69: aload_3
70: invokespecial #52 // Method scala/MatchError."<init>":(Ljava/lang/Object;)V
73: athrow
74: aload_1
75: astore_2
76: aload_2
77: invokevirtual #37 // Method scala/Tuple3._1:()Ljava/lang/Object;
80: invokestatic #41 // Method scala/runtime/BoxesRunTime.unboxToInt:(Ljava/lang/Object;)I
83: istore 7
85: aload_2
86: invokevirtual #44 // Method scala/Tuple3._2:()Ljava/lang/Object;
89: invokestatic #41 // Method scala/runtime/BoxesRunTime.unboxToInt:(Ljava/lang/Object;)I
92: istore 8
94: aload_2
95: invokevirtual #47 // Method scala/Tuple3._3:()Ljava/lang/Object;
98: invokestatic #41 // Method scala/runtime/BoxesRunTime.unboxToInt:(Ljava/lang/Object;)I
101: istore 9
103: return
Boxing jokes aside the problem is clearer when using values of type AnyRef to simplify the bytecode (no boxing, unboxing or checkcast
):
0: aload_0
1: invokevirtual #29 // Method foo:()Lscala/Tuple3;
4: astore_3
5: aload_3
6: ifnull 44
9: aload_3
10: invokevirtual #33 // Method scala/Tuple3._1:()Ljava/lang/Object;
13: astore 4
15: aload_3
16: invokevirtual #36 // Method scala/Tuple3._2:()Ljava/lang/Object;
19: astore 5
21: aload_3
22: invokevirtual #39 // Method scala/Tuple3._3:()Ljava/lang/Object;
25: astore 6
27: new #17 // class scala/Tuple3
30: dup
31: aload 4
33: aload 5
35: aload 6
37: invokespecial #24 // Method scala/Tuple3."<init>":(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)V
40: astore_1
41: goto 56
44: goto 47
47: new #41 // class scala/MatchError
50: dup
51: aload_3
52: invokespecial #44 // Method scala/MatchError."<init>":(Ljava/lang/Object;)V
55: athrow
56: aload_1
57: astore_2
58: aload_2
59: invokevirtual #33 // Method scala/Tuple3._1:()Ljava/lang/Object;
62: astore 7
64: aload_2
65: invokevirtual #36 // Method scala/Tuple3._2:()Ljava/lang/Object;
68: astore 8
70: aload_2
71: invokevirtual #39 // Method scala/Tuple3._3:()Ljava/lang/Object;
74: astore 9
76: return
Based on my rusty JVM assembly skills, this translates to:
val t1 = foo
val t2 = if(t1 eq null) throw new MatchError
else {
val x = t1._1
val y = t1._2
val z = t2._3
new Tuple3(x, y, z)
}
val x = t2._1
val y = t2._2
val z = t2._3
Under 2.12.4, with opt:l:method
:
public void bar();
descriptor: ()V
flags: ACC_PUBLIC
Code:
stack=3, locals=2, args_size=1
0: aload_0
1: invokevirtual #28 // Method foo:()Lscala/Tuple3;
4: astore_1
5: aload_1
6: ifnull 25
9: aload_1
10: invokevirtual #32 // Method scala/Tuple3._1:()Ljava/lang/Object;
13: pop
14: aload_1
15: invokevirtual #35 // Method scala/Tuple3._2:()Ljava/lang/Object;
18: pop
19: aload_1
20: invokevirtual #38 // Method scala/Tuple3._3:()Ljava/lang/Object;
23: pop
24: return
25: new #40 // class scala/MatchError
28: dup
29: aload_1
30: invokespecial #43 // Method scala/MatchError."<init>":(Ljava/lang/Object;)V
33: athrow
LocalVariableTable:
Start Length Slot Name Signature
0 34 0 this LX;
LineNumberTable:
line 1: 0
StackMapTable: number_of_entries = 1
frame_type = 252 /* append */
offset_delta = 25
locals = [ class scala/Tuple3 ]
which is more like
val tmp_1 = foo
if (tmp_1 eq null)
throw new MatchError(tmp_1)
else {
foo._1
foo._2
foo._3
}
as I imagine is what you expected.
Does the strawman not use the optimizer?
Yeah, -opt:l:method
can handle this. Convenience tip: use cfr-decompiler (http://www.benf.org/other/cfr/)
sandbox git:(travis-trigger-dist) scalac Test.scala -opt:l:method
sandbox git:(travis-trigger-dist) cfr-decompiler X.class
...
public void bar() {
Tuple3<Object, Object, Object> tuple3 = this.foo();
if (tuple3 != null) {
tuple3._1();
tuple3._2();
tuple3._3();
return;
}
throw new MatchError(tuple3);
}
...
Is there a good reason why that extra trip through Tuple3 is there without the extra optimization?
I would expect this kind of code never to come out of patmat rather than relying on bytecode optimizations.
I would expect this kind of code never to come out of patmat rather than relying on bytecode optimizations.
Once an optimizer can handle this, it can also handle the same code when it arises elsewhere (say from inlining), and then you can simplify your translations by relying on later optimizations, rather than duplicating similar logic (or more complicated special cases).
@lrytz and @Blaisorblade - The optimizer (at least opt:l:method) should be on by default, then.
That the compiler ever produces code like this by default should be considered a bug. How, internally, it is avoided is less important than it not appearing in bytecode unless requested.
Note that case classes have the same problem.
Addendum: if you run a trivial benchmark like comparing n
and m
in
class K(s: String) {
val tup = (s.take(1), s, s.drop(2))
def n = {
val (a, b, c) = tup
a.length + b.length + c.length
}
def m = {
val t = tup
t._1.length + t._2.length + t._3.length
}
}
you find that the latter runs ~18% faster. The JIT compiler cannot (completely) fix this.
I totally agree that performance-conscious people should default to using the optimizer for production builds. And thanks for doing the work and showing the JIT can't remove this overhead entirely—I've believed this for long (mostly based on comments by Paul) but I never had something to cite.
And look at my nice headline: https://twitter.com/Blaisorblade/status/972258678337359872 (any needed corrections welcome); I didn't wait to see the benchmark and check you got it right because you know better than me.
The optimizer (at least opt:l:method) should be on by default, then.
If that argument worked, C compilers should default to -O1
because of the garbage assembly they produce at -O0
.
What's true is that users default to optimizations on in C but do not in Scala. Also, C optimizers cause significant slowdowns, dunno if this is a concern for Scalac nowadays or with planned optimizers.
I think that most users are not aware of the optimizer options. Documentation about the compiler options in general is not as easy to find as it could be IMO, perhaps that deserves a separate issue.
@Blaisorblade - Historically, -optimise has done nothing, so people are not used to using it for Scala (nor is anyone from Java!), and if the compiler is going to produce slow code unless you optimize, there needs to be a major outreach effort informing people of such; and, -O2 is immensely easier to type and listed everywhere in examples of C/C++, whereas -opt:l:method is hard to type and almost impossible to find.
So this isn't a very compelling argument. Adding -O to the scala compiler would probably be okay (given appropriate announcements).
From a tooling perspective, it would be nice if build tools like SBT would automatically enable or disable some default level of optimisation depending on whether it's a release or snapshot build (similar to what Cargo does for Rust builds)
There's lots of actionable and constructive ideas; some could be ready for issues or documentation PRs. I suspect at this point a discussion on contributors might be best, though having defaults in build tools seems closer to usual practice.
@NthPortal That makes sense to me, build tools have more information. In fact, it seems SBT has enough info to enable interprocedural optimizations across a library when releasing it. And without interprocedural optimizations you probably get similarly bad performance, just split across method calls.
@Ichoran I just meant "your particular argument proves too much, so something's wrong". And yeah, I'm aware of the cultural differences. But Scalac can't enable interprocedural optimizations (on which code) and slower compile times by default would also be a regression, so for now build tools sounds better.
@Blaisorblade - My argument is that the compiler shouldn't locally produce inefficient code unless an optimizer is always on to clean it up. There are plenty of complex cases where it's reasonable to generate the simpler and less efficient code unless someone asks for the more sophisticated analysis to try to improve it. My argument wasn't that all possible analysis should be done by default.
Here, unnecessarily complex code is being produced to begin with. If it's slow to generate bad code and clean it up, then the compiler shouldn't generate bad code to begin with. This goes for for
comprehensions as well (the desugaring of =
is unfortunate). I can understand if everyone is too busy to fix it right now, but I can't understand this being "not a bug". Either it's a compile speed bug, where you're requiring extra work for no reason; or it's a code performance bug, where either the generation should be improved or the optimizer should get to have a go at it.
Dup of #8790. And discussed on Gitter around https://gitter.im/scala/contributors?at=5aa4a5fa458cbde5571d9524.
Reading this makes me sad. I am certainly not an expert in this area but here are a few thoughts
Optimisation is part of the back end genBcode isn't it This means that it is specific to the particular back end (bytecode js native etc) doesn't this mean that any optimization end up being duplicated or not in each of these generators
I never understood why all optimisation is back end specific. Maybe it's a good time to address this
Method level optimisation can run in parallel now but but generation of a more efficient free in the first place would make all subsequent phases faster
Running different build flags for test and prod code is not really an option in an agile environment IMO and we already burn many thousands of hours of cpu every day in compile and CI
I thing that the compiler should generate idiomatic code that should be reasonably efficient and optimization should be icing on the cake
Many optimizations are backend-specific. You don't optimize the same way for the JVM or for JS or for LLVM. So in general, it is good for the performance on all platforms that each one has its own optimizer. In addition, the ones that are common are typically easy to implement, so the duplication is not a big deal.
That said, I've always thought the pattern matcher produced pretty bad code by default, and I'm not sure we wouldn't be able to produce better code right from the start.
Documentation about the compiler options in general is not as easy to find as it could be IMO, perhaps that deserves a separate issue
issue: https://github.com/scala/docs.scala-lang/issues/1058 brand new PR: https://github.com/scala/docs.scala-lang/pull/1063
/cc @mkeskells @rorygraves I believe this to be the ticket of reference for the issue we were discussing. We can take another look at this to see if we can get rid of the problem in val pattern desguraring, rather than relying on opt:l:method
being used.
Cool. I had forgotten about this ticket.
The Gitter convo also points to the latest efforts in scala/scala#4376.
current repro
➜ snips scala -J--enable-preview -J--add-exports -Jjdk.jdeps/com.sun.tools.javap=ALL-UNNAMED -nobootcp -Xlint
Welcome to Scala 2.13.13 (OpenJDK 64-Bit Server VM, Java 21.0.2).
Type in expressions for evaluation. Or try :help.
scala> class X { def foo = (1,2,3); def bar(): Unit = { val (x,y,z) = foo } }
class X
scala> :javap X#bar
public void bar();
also
-opt:local