scala-hashing icon indicating copy to clipboard operation
scala-hashing copied to clipboard

Investigate Performance Regression with Scala 2.13.0

Open desmondyeung opened this issue 4 years ago • 2 comments

bench/jmh:run -i 3 -wi 3 -f1 XxHash64Bench.com_desmondyeung_hashing

2.12.9

[info] Benchmark                               (inputSize)   Mode  Cnt          Score         Error  Units
[info] XxHash64Bench.com_desmondyeung_hashing            8  thrpt    3  185778619.019 ± 1934781.573  ops/s
[info] XxHash64Bench.com_desmondyeung_hashing          128  thrpt    3   56514066.010 ±  634268.161  ops/s
[info] XxHash64Bench.com_desmondyeung_hashing          512  thrpt    3   23600141.441 ±  810006.785  ops/s
[info] XxHash64Bench.com_desmondyeung_hashing         1024  thrpt    3   13299685.393 ±  717831.068  ops/s
[info] XxHash64Bench.com_desmondyeung_hashing         1536  thrpt    3    9246722.872 ±  260972.021  ops/s
[info] XxHash64Bench.com_desmondyeung_hashing         2048  thrpt    3    7056145.724 ±  265236.056  ops/s
[success] Total time: 184 s, completed Aug 21, 2019, 3:15:13 PM

2.13.0

[info] Benchmark                               (inputSize)   Mode  Cnt          Score         Error  Units
[info] XxHash64Bench.com_desmondyeung_hashing            8  thrpt    3  184298585.739 ± 2062166.900  ops/s
[info] XxHash64Bench.com_desmondyeung_hashing          128  thrpt    3   52045201.983 ± 3229937.845  ops/s
[info] XxHash64Bench.com_desmondyeung_hashing          512  thrpt    3   21511399.488 ±  311140.849  ops/s
[info] XxHash64Bench.com_desmondyeung_hashing         1024  thrpt    3   11933307.339 ±  745263.751  ops/s
[info] XxHash64Bench.com_desmondyeung_hashing         1536  thrpt    3    8226233.328 ±  404379.004  ops/s
[info] XxHash64Bench.com_desmondyeung_hashing         2048  thrpt    3    6321480.430 ±  312725.166  ops/s
[success] Total time: 190 s, completed Aug 21, 2019, 3:11:30 PM

desmondyeung avatar Aug 21 '19 20:08 desmondyeung

@desmondyeung have you considered to inline all prime constants manually?

plokhotnyuk avatar Aug 23 '19 13:08 plokhotnyuk

@plokhotnyuk yes, I had originally not inlined them with 2.12.9 because I actually found that it made performance much worse. It's seems that inlining them with 2.13 does help with larger input, but it's still slower than 2.12.9 without inlining.

2.12.9 inlined prime constants

[info] Benchmark                               (inputSize)   Mode  Cnt          Score          Error  Units
[info] XxHash64Bench.com_desmondyeung_hashing            8  thrpt    3  166010140.218 ± 13125301.638  ops/s
[info] XxHash64Bench.com_desmondyeung_hashing          128  thrpt    3   49034132.659 ±   933867.058  ops/s
[info] XxHash64Bench.com_desmondyeung_hashing          512  thrpt    3   19074074.413 ±  1103220.964  ops/s
[info] XxHash64Bench.com_desmondyeung_hashing         1024  thrpt    3   10842131.559 ±   757732.072  ops/s
[info] XxHash64Bench.com_desmondyeung_hashing         1536  thrpt    3    7495990.225 ±  2007210.588  ops/s
[info] XxHash64Bench.com_desmondyeung_hashing         2048  thrpt    3    5684602.006 ±   105917.575  ops/s
[success] Total time: 118 s, completed Aug 23, 2019, 3:24:31 PM

2.13.0 inlined prime constants

[info] Do not assume the numbers tell you what you want them to tell.
[info] Benchmark                               (inputSize)   Mode  Cnt          Score         Error  Units
[info] XxHash64Bench.com_desmondyeung_hashing            8  thrpt    3  156177235.277 ± 7133756.881  ops/s
[info] XxHash64Bench.com_desmondyeung_hashing          128  thrpt    3   51057573.056 ± 1136897.911  ops/s
[info] XxHash64Bench.com_desmondyeung_hashing          512  thrpt    3   22916472.679 ± 1689274.899  ops/s
[info] XxHash64Bench.com_desmondyeung_hashing         1024  thrpt    3   12553725.655 ±  881055.406  ops/s
[info] XxHash64Bench.com_desmondyeung_hashing         1536  thrpt    3    9083090.027 ±  496895.445  ops/s
[info] XxHash64Bench.com_desmondyeung_hashing         2048  thrpt    3    7008018.944 ±  261122.178  ops/s
[success] Total time: 118 s, completed Aug 23, 2019, 3:13:59 PM

with final prime constants:

  public final long hashBytes(byte[], long, int, long);
    Code:
       0: lload_2
       1: lstore        9
       3: iload         4
       5: istore        11
       7: iload         4
       9: bipush        32
      11: if_icmplt     359
      14: lload         5
      16: ldc2_w        #48                 // long -7046029288634856825l
      19: ladd
      20: ldc2_w        #51                 // long -4417276706812531889l
      23: ladd
      24: lstore        12
      26: lload         5
      28: ldc2_w        #51                 // long -4417276706812531889l
      31: ladd
      32: lstore        14
      34: lload         5
      36: lstore        16
      38: lload         5
      40: ldc2_w        #48                 // long -7046029288634856825l

with non-final prime constants

  public final long hashBytes(byte[], long, int, long);
    Code:
       0: lload_2
       1: lstore        9
       3: iload         4
       5: istore        11
       7: iload         4
       9: bipush        32
      11: if_icmplt     227
      14: lload         5
      16: aload_0
      17: invokevirtual #111                // Method Prime1:()J
      20: ladd
      21: aload_0
      22: invokevirtual #104                // Method Prime2:()J
      25: ladd
      26: lstore        12
      28: lload         5
      30: aload_0
      31: invokevirtual #104                // Method Prime2:()J
      34: ladd
      35: lstore        14
      37: lload         5
      39: lstore        16
      41: lload         5
      43: aload_0
      44: invokevirtual #111                // Method Prime1:()J

desmondyeung avatar Aug 23 '19 19:08 desmondyeung