scalac_perf
scalac_perf copied to clipboard
List should use eq Nil instead of isEmpty for end of list detection
var these = this
while (!these.isEmpty) {
f(these.head)
these = these.tail
}
}
could be
var these = this
while (these ne Nil) {
f(these.head)
these = these.tail
}
}
also MurmurHash3
final def listHash(xs: scala.collection.immutable.List[_], seed: Int): Int = {
var n = 0
var h = seed
var elems = xs
while (!elems.isEmpty) {
val head = elems.head
val tail = elems.tail
h = mix(h, head.##)
n += 1
elems = tail
}
finalizeHash(h, n)
}
@mkeskells I wrote a micro benchmark to compare the performance of list eq Nil and list.isEmpty , PR here , and the benchmark results is just as follows: There seems no obvious difference between their performance, I will further investigate their bytecode to understand the mechanism underneath. 😄
scala.collection.immutable.List.length
Eq Nil
[info] Benchmark (size) Mode Cnt Score Error Units
[info] ListBenchmark.call_length 0 avgt 20 3.953 ± 0.221 ns/op
[info] ListBenchmark.call_length 1 avgt 20 6.838 ± 1.113 ns/op
[info] ListBenchmark.call_length 10 avgt 20 17.357 ± 1.756 ns/op
[info] ListBenchmark.call_length 100 avgt 20 216.360 ± 2.863 ns/op
[info] ListBenchmark.call_length 1000 avgt 20 2432.341 ± 60.765 ns/op
.isEmpty
[info] Benchmark (size) Mode Cnt Score Error Units
[info] ListBenchmark.call_length 0 avgt 20 3.568 ± 0.114 ns/op
[info] ListBenchmark.call_length 1 avgt 20 4.301 ± 0.482 ns/op
[info] ListBenchmark.call_length 10 avgt 20 14.973 ± 0.353 ns/op
[info] ListBenchmark.call_length 100 avgt 20 208.602 ± 2.478 ns/op
[info] ListBenchmark.call_length 1000 avgt 20 2310.960 ± 24.554 ns/op
scala.collection.immutable.List.lengthCompare
eq Nil
[info] Benchmark (size) Mode Cnt Score Error Units
[info] ListBenchmark.call_lengthCompare 0 avgt 20 5.678 ± 3.099 ns/op
[info] ListBenchmark.call_lengthCompare 1 avgt 20 4.057 ± 0.207 ns/op
[info] ListBenchmark.call_lengthCompare 10 avgt 20 8.789 ± 0.324 ns/op
[info] ListBenchmark.call_lengthCompare 100 avgt 20 97.562 ± 8.604 ns/op
[info] ListBenchmark.call_lengthCompare 1000 avgt 20 1012.257 ± 20.556 ns/op
.isEmpty
[info] Benchmark (size) Mode Cnt Score Error Units
[info] ListBenchmark.call_lengthCompare 0 avgt 20 4.135 ± 0.111 ns/op
[info] ListBenchmark.call_lengthCompare 1 avgt 20 4.038 ± 0.088 ns/op
[info] ListBenchmark.call_lengthCompare 10 avgt 20 9.278 ± 2.062 ns/op
[info] ListBenchmark.call_lengthCompare 100 avgt 20 90.333 ± 2.914 ns/op
[info] ListBenchmark.call_lengthCompare 1000 avgt 20 999.997 ± 20.171 ns/op
scala.collection.immutable.List.dropwhile
Eq Nil
[info] Benchmark (size) Mode Cnt Score Error Units
[info] ListBenchmark.call_dropWhile 0 avgt 20 3.567 ± 0.298 ns/op
[info] ListBenchmark.call_dropWhile 1 avgt 20 5.949 ± 0.970 ns/op
[info] ListBenchmark.call_dropWhile 10 avgt 20 21.641 ± 0.801 ns/op
[info] ListBenchmark.call_dropWhile 100 avgt 20 248.440 ± 6.461 ns/op
[info] ListBenchmark.call_dropWhile 1000 avgt 20 2805.942 ± 500.897 ns/op
.isEmpty
[info] Benchmark (size) Mode Cnt Score Error Units
[info] ListBenchmark.call_dropWhile 0 avgt 20 3.620 ± 0.359 ns/op
[info] ListBenchmark.call_dropWhile 1 avgt 20 4.638 ± 0.049 ns/op
[info] ListBenchmark.call_dropWhile 10 avgt 20 24.545 ± 14.729 ns/op
[info] ListBenchmark.call_dropWhile 100 avgt 20 340.365 ± 138.329 ns/op
[info] ListBenchmark.call_dropWhile 1000 avgt 20 2704.003 ± 406.652 ns/op
scala.util.hashing.MurmurHash3.listHash
Eq Nil
[info] Benchmark (size) Mode Cnt Score Error Units
[info] MurmurHash3Benchmark.call_listHash 0 avgt 20 6.428 ± 0.798 ns/op
[info] MurmurHash3Benchmark.call_listHash 1 avgt 20 19.277 ± 2.439 ns/op
[info] MurmurHash3Benchmark.call_listHash 10 avgt 20 125.612 ± 13.692 ns/op
[info] MurmurHash3Benchmark.call_listHash 100 avgt 20 1152.770 ± 107.546 ns/op
[info] MurmurHash3Benchmark.call_listHash 1000 avgt 20 13605.814 ± 608.183 ns/op
.isEmpty
[info] Benchmark (size) Mode Cnt Score Error Units
[info] MurmurHash3Benchmark.call_listHash 0 avgt 20 6.052 ± 0.208 ns/op
[info] MurmurHash3Benchmark.call_listHash 1 avgt 20 11.668 ± 0.139 ns/op
[info] MurmurHash3Benchmark.call_listHash 10 avgt 20 112.381 ± 7.618 ns/op
[info] MurmurHash3Benchmark.call_listHash 100 avgt 20 1056.814 ± 32.572 ns/op
[info] MurmurHash3Benchmark.call_listHash 1000 avgt 20 12452.243 ± 459.762 ns/op
@feyman2016 I agree - this is suprising - I would be very interested into finding out what the bytecode looks like.
@rorygraves @mkeskells
I investigated the scala.util.hashing.MurmurHash3#listHash
as follows:
final def listHash(xs: scala.collection.immutable.List[_], seed: Int): Int = {
var n = 0
var h = seed
var elems = xs
while (!elems.isEmpty) {
val head = elems.head
val tail = elems.tail
h = mix(h, head.##)
n += 1
elems = tail
}
finalizeHash(h, n)
}
If we use while (!elems.isEmpty) {
, we wil get the bytecode like:
L6
LINENUMBER 166 L6
FRAME APPEND [I I scala/collection/immutable/List]
ALOAD 5
INVOKEVIRTUAL scala/collection/immutable/List.isEmpty ()Z
IFNE L7
or if we use while (elems ne Nil) {
, we will get :
L6
LINENUMBER 166 L6
FRAME APPEND [I I scala/collection/immutable/List]
ALOAD 5
GETSTATIC scala/collection/immutable/Nil$.MODULE$ : Lscala/collection/immutable/Nil$;
IF_ACMPEQ L7
These bytecode of both are just like what I thought they would be, which makes benchmark results even more difficult to explain.
Is it that the jit cannot determine that Nil is a constant
Can you try with a java class
Class holder{ final static List NIL = Nil }
What you can't see from the bytecode is what the jit does with this and the information that it deduces. If the jit knows it's a constant then it may get inlined. If the git cant do that then it's a static field access To complicate further java 9 seems to do a better job at this sometimes and particularly for invokedynamic stuff (which we don't have here I think)
That's why I suggest comparing to idiomatic java code
I tried to find some clue from the machine code of scala.collection.immutable.List.length
, results as below(line 339 is where we use eq Nil or isEmpty):
Eq Nil (bench = 100000)
0x000000010c5deece: movabs $0x76b6701e0,%r10 ; {oop(a 'java/lang/Class' = 'scala/collection/immutable/Nil$')}
[info] 0x000000010c5deed8: mov 0x68(%r10),%r11d ;*getstatic MODULE$
[info] ; - scala.collection.immutable.List::length@5 (line 339)
[info]
[info] 0x000000010c5deedc: mov %r11,%r10
[info] 0x000000010c5deedf: shl $0x3,%r10
[info] 0x000000010c5deee3: cmp %r10,%rsi
[info] 0x000000010c5deee6: je 0x000000010c5def15 ;*if_acmpeq
[info] ; - scala.collection.immutable.List::length@8 (line 339)
[info] 0x000000010fb38387: movabs $0x76b6701e8,%r10 ; {oop(a 'java/lang/Class' = 'scala/collection/immutable/Nil$')}
[info] 0x000000010fb38391: jmp 0x000000010fb383ae
[info] 0x000000010fb38393: nopw 0x0(%rax,%rax,1)
[info] 0x000000010fb3839c: data16 data16 xchg %ax,%ax
[info] 0x000000010fb389a3: movabs $0x76b6701e8,%rbx ; {oop(a 'java/lang/Class' = 'scala/collection/immutable/Nil$')}
[info] 0x000000010fb389ad: mov 0x68(%rbx),%ebx
[info] 0x000000010fb389b0: shl $0x3,%rbx ;*getstatic MODULE$
[info] ; - scala.collection.immutable.List::length@5 (line 339)
[info]
[info] 0x000000010fb389b4: cmp %rbx,%rsi
[info] 0x000000010fb389b7: movabs $0x10b70eaf0,%rbx ; {metadata(method data for {method} {0x000000010b675b88} 'length' '()I' in 'scala/collection/immutable/List')}
[info] 0x000000010fb389c1: movabs $0x108,%rax
[info] 0x000000010fb389cb: je 0x000000010fb389db
[info] 0x000000010fb389d1: movabs $0x118,%rax
[info] 0x000000010fb389db: mov (%rbx,%rax,1),%rdx
[info] 0x000000010fb389df: lea 0x1(%rdx),%rdx
[info] 0x000000010fb389e3: mov %rdx,(%rbx,%rax,1)
[info] 0x000000010fb389e7: jne 0x000000010fb387c8 ;*if_acmpeq
[info] ; - scala.collection.immutable.List::length@8 (line 339)
.isEmpty (bench = 100000)
[info] 0x0000000109c45075: mov 0x10(%rsi),%r8d ; OopMap{r8=NarrowOop off=121}
[info] ;*goto
[info] ; - scala.collection.immutable.List::length@23 (line 339)
[info]
[info] 0x0000000109c45079: test %eax,-0x22c007f(%rip) # 0x0000000107985000
[info] ;*goto
[info] ; - scala.collection.immutable.List::length@23 (line 339)
[info] ; {poll}
[info] 0x0000000109c4507f: mov 0x8(%r12,%r8,8),%r9d ; implicit exception: dispatches to 0x0000000109c450f1
[info] 0x0000000109c45084: cmp $0xf801ba85,%r9d ; {metadata('scala/collection/immutable/$colon$colon')}
[info] 0x0000000109c4508b: je 0x0000000109c45060
[info] 0x0000000109c4508d: cmp $0xf801903c,%r9d ; {metadata('scala/collection/immutable/Nil$')}
[info] 0x0000000109c45094: jne 0x0000000109c450c6 ;*ifne
[info] ; - scala.collection.immutable.List::length@8 (line 339)
[info] 0x0000000109c45060: lea (%r12,%r8,8),%rsi ;*invokevirtual isEmpty
[info] ; - scala.collection.immutable.List::length@5 (line 339)
...
...
[info] 0x0000000109c45060: lea (%r12,%r8,8),%rsi ;*invokevirtual isEmpty
[info] ; - scala.collection.immutable.List::length@5 (line 339)
...
...
[info] 0x0000000109c450c6: mov $0xffffffc6,%esi
[info] 0x0000000109c450cb: mov %eax,(%rsp)
[info] 0x0000000109c450ce: mov %r8d,0x4(%rsp)
[info] 0x0000000109c450d3: callq 0x0000000109a051a0 ; OopMap{[4]=NarrowOop off=216}
[info] ;*invokevirtual isEmpty
[info] ; - scala.collection.immutable.List::length@5 (line 339)
[info] ; {runtime_call}
Is it because .isEmpty is override in both Nil
and ::
, so calling .isEmpty
would be simply equal to comparison the list to Nil , instead of call .isEmpty
in the super type ? @mkeskells
I would look at a little benchmark - 3 method that does
- isEmpty
- eq Nil
- java based == a java final static field containing Nil
- maybe try isInstanceOf[Nil]
- .getClass eq classOf[Nil]
- anything else that you thin may be faster
test for performance, and look at the code generated. We are looking for the fastest!
I wonder why the MODULE$
fields aren't static final
in the first place. Shouldn't it be OK to make it final when it's initialized in a static initializer?
@szeiger https://github.com/rorygraves/scalac_perf/issues/19 relates - @retronym did some cases in https://github.com/scala/scala/pull/6523, but I don't think that can be applied to deeper hierarchies without more work. There is some discussion in that ticket
The problem is that a parent ctor can refer to the MODULE$
directly or indirectly, and it cant be null
, or we would endlessly recurse