compiler-benchmark icon indicating copy to clipboard operation
compiler-benchmark copied to clipboard

New Julia benchmark

Open PallHaraldsson opened this issue 2 years ago • 12 comments

First, you might want to benchmark Julia on master as is (or possibly next nightly, I just noticed yet one more improvement merged just now "Remove alloca from codegen").

I don't know if the issue with your very unusual benchmark is fixed. But Julia does use -O2 by default so you might also want to try running with -O0 (or --inline=no that I think is at least implied by the lowest level) or -O1, since there is no Julia debug/development-build mode, and that's the closest I can think of; Or even with --compile=min

At least if you see an improvement, there's also a further 25% improvement available (but you have to opt into this new Julia parser, it will be merged into Julia, but then also at first off by default):

https://github.com/JuliaLang/JuliaSyntax.jl/pull/228

I also wanted to point that out for you for D (or other) language.

PallHaraldsson avatar Apr 10 '23 14:04 PallHaraldsson

Note that the script is already using

JULIA_INTERPRET_FLAGS = ['--compile=min']  # See: https://github.com/JuliaLang/julia/issues/41360#issuecomment-872075102

nordlow avatar Apr 11 '23 09:04 nordlow

I reran with Julia master and got:

Lang-uage Temp-lated Check Time [us/fn] Compile Time [us/fn] Build Time [us/fn] Run Time [us/fn] Check RSS [kB/fn] Build RSS [kB/fn] Exec Version Exec Path
D No 7.6 (3.7x) 17.1 (10.7x) 21.2 (12.0x) 78 (4.0x) 4.4 (9.8x) 13.4 (30.1x) v2.103.0-rc.1-87-g7e84fb3333-dirty dmd
D No 5.0 (2.4x) 91.6 (57.1x) 92.4 (52.4x) 325 (16.6x) 4.8 (10.7x) 19.8 (44.3x) 1.30.0 ldmd2
D No 7.3 (3.5x) 232.8 (145.2x) 231.3 (131.0x) 64 (3.2x) 4.6 (10.3x) 19.2 (43.1x) 11.3.0 gdc
D Yes 19.9 (9.6x) 32.2 (20.1x) 36.0 (20.4x) 49 (2.5x) 12.6 (27.8x) 22.0 (49.4x) v2.103.0-rc.1-87-g7e84fb3333-dirty dmd
D Yes 10.5 (5.1x) 97.7 (61.0x) 100.0 (56.7x) 272 (13.9x) 12.9 (28.6x) 28.9 (64.8x) 1.30.0 ldmd2
D Yes 13.3 (6.5x) 244.7 (152.7x) 241.7 (136.9x) 62 (3.1x) 13.4 (29.6x) 28.8 (64.6x) 11.3.0 gdc
C No 2.1 (best) 1.6 (best) 1.8 (best) 20 (best) 0.5 (best) 0.4 (best) 0.9.27 tcc
C No 9.4 (4.6x) 293.4 (183.1x) 303.0 (171.6x) 36 (1.9x) 2.7 (6.0x) 13.6 (30.6x) 12.1.0 gcc
C No 5.9 (2.9x) 207.8 (129.7x) 203.7 (115.4x) 60 (3.1x) 2.7 (6.1x) 14.2 (31.7x) 9.5.0 gcc-9
C No 6.1 (3.0x) 217.8 (135.9x) 219.4 (124.3x) 37 (1.9x) 2.7 (6.1x) 14.2 (31.8x) 10.4.0 gcc-10
C No 6.7 (3.3x) 228.2 (142.4x) 221.2 (125.3x) 38 (1.9x) 2.6 (5.9x) 14.1 (31.7x) 11.3.0 gcc-11
C No 10.1 (4.9x) 298.7 (186.4x) 299.2 (169.5x) 23 (1.1x) 2.8 (6.2x) 13.6 (30.6x) 12.1.0 gcc-12
C No 18.1 (8.8x) 119.7 (74.7x) 120.6 (68.3x) 612 (31.2x) 2.1 (4.6x) sampling error 14.0.0-1 clang
C No 18.1 (8.8x) 115.6 (72.1x) 118.6 (67.2x) 545 (27.8x) 2.1 (4.6x) 9.4 (21.1x) 14.0.0-1 clang-14
C++ No 14.3 (7.0x) 233.5 (145.7x) 233.9 (132.5x) 38 (1.9x) 4.4 (9.7x) 14.0 (31.5x) 11.3.0 g++
C++ No 14.3 (6.9x) 229.4 (143.1x) 232.4 (131.7x) 34 (1.7x) 4.4 (9.7x) 14.1 (31.5x) 10.4.0 g++-10
C++ No 14.1 (6.8x) 228.5 (142.6x) 236.8 (134.1x) 37 (1.9x) 4.4 (9.7x) 14.0 (31.5x) 11.3.0 g++-11
C++ No 23.1 (11.2x) 315.3 (196.8x) 318.3 (180.3x) 65 (3.3x) sampling error 16.4 (36.8x) 12.1.0 g++-12
C++ No 26.0 (12.6x) 128.9 (80.4x) 127.9 (72.5x) 541 (27.6x) 2.2 (4.8x) 9.4 (21.1x) 14.0.0-1 clang
C++ No 25.2 (12.2x) 129.4 (80.7x) 132.7 (75.2x) 541 (27.6x) 2.2 (4.8x) 9.4 (21.1x) 14.0.0-1 clang-14
C++ Yes 30.5 (14.8x) 278.3 (173.6x) 277.9 (157.5x) 28 (1.4x) 8.0 (17.7x) 20.5 (46.0x) 11.3.0 g++
C++ Yes 30.9 (15.0x) 278.2 (173.6x) 279.6 (158.4x) 27 (1.4x) 8.0 (17.6x) 21.8 (48.9x) 10.4.0 g++-10
C++ Yes 29.1 (14.1x) 281.9 (175.9x) 280.7 (159.0x) 27 (1.4x) 8.0 (17.7x) 20.6 (46.1x) 11.3.0 g++-11
C++ Yes 41.7 (20.3x) 371.1 (231.6x) 366.9 (207.8x) 26 (1.3x) 8.0 (17.7x) 20.6 (46.1x) 12.1.0 g++-12
C++ Yes 40.0 (19.4x) 129.5 (80.8x) 134.5 (76.2x) 381 (19.5x) 4.0 (8.8x) 12.6 (28.3x) 14.0.0-1 clang
C++ Yes 39.1 (19.0x) 132.9 (82.9x) 136.4 (77.3x) 622 (31.7x) 4.0 (8.8x) 12.6 (28.3x) 14.0.0-1 clang-14
Ada No N/A N/A 943.7 (534.7x) 68 (3.5x) N/A 31.3 (70.2x) 12.1.0 gnat
Ada No N/A N/A 950.3 (538.4x) 69 (3.5x) N/A 31.4 (70.3x) 12.1.0 gnat-12
Go No 16.0 (7.8x) N/A N/A N/A 4.0 (8.9x) N/A 1.18.3 gotype
N/A N/A N/A N/A N/A N/A 6.5 (14.5x) 24.3 (54.4x) N/A N/A
N/A N/A N/A N/A N/A N/A 11.2 (24.8x) 23.5 (52.7x) N/A N/A
Go No N/A N/A 166.0 (94.0x) 132 (6.7x) N/A 28.3 (63.4x) 1.18.3 go
N/A N/A N/A N/A N/A N/A N/A 18.4 (41.1x) N/A N/A
N/A N/A N/A N/A N/A N/A N/A 50.3 (112.8x) N/A N/A
Zig No 22.5 (10.9x) N/A 531.6 (301.2x) 1150 (58.7x) 5.6 (12.5x) 34.8 (78.1x) 0.11.0-dev.2545+311d50f9d zig
Zig Yes 27.2 (13.2x) N/A 547.6 (310.2x) 1123 (57.3x) 5.6 (12.5x) 35.9 (80.5x) 0.11.0-dev.2545+311d50f9d zig
Rust No 73.5 (35.7x) N/A 230.6 (130.6x) 1474 (75.2x) 13.6 (30.1x) 29.7 (66.6x) 1.70.0-nightly rustc
Rust Yes 84.9 (41.2x) N/A 148.9 (84.4x) 1442 (73.6x) 15.7 (34.8x) 18.6 (41.6x) 1.70.0-nightly rustc
Nim No 36.7 (17.8x) N/A 80.5 (45.6x) 66 (3.3x) 4.2 (9.3x) 8.0 (18.0x) 1.4.6 nim
C# No N/A N/A 21.6 (12.2x) 384 (19.6x) N/A 4.4 (9.8x) 6.12.0.182 mcs
N/A N/A N/A N/A N/A N/A N/A 13.2 (29.6x) N/A N/A
OCaml No N/A N/A 445.5 (252.4x) 637 (32.5x) N/A 34.6 (77.5x) 4.13.1 ocamlopt
OCaml No N/A N/A 87.6 (49.6x) 907 (46.3x) N/A 17.7 (39.6x) 4.13.1 ocamlc
Julia No N/A N/A 410.5 (232.6x) N/A N/A 25.6 (57.4x) 1.10.0-DEV julia
Julia Yes N/A N/A 335.6 (190.1x) N/A N/A 25.4 (56.8x) 1.10.0-DEV julia

.

nordlow avatar Apr 11 '23 13:04 nordlow

Since the script is using --compile=min, then alternatively you could drop it to see if the default is better, or e.g. -O0.

Anyway, it's at least going in the right direction. And 1.8.0-DEV is of course very outdated, and I expect 1.9.0 to be released in a week or so, so it's time for 1.10.0-DEV.

PallHaraldsson avatar Apr 11 '23 13:04 PallHaraldsson

Using -O0 is slower than --compile=min. I checked.

nordlow avatar Apr 11 '23 15:04 nordlow

Closing this.

nordlow avatar Apr 11 '23 15:04 nordlow

Good to know about -O0 (also slower with the default -O2, or -O1?). You can get 25% faster parsing with JuliaSyntax.jl, but since it likely wasn't the bottleneck (your call to check, or decide to use that non-default option), I guess you can ignore it.

PallHaraldsson avatar Apr 12 '23 14:04 PallHaraldsson

They did fix constprop to be faster, but there's no way to drop that optimization completely. Doing away with it, or all opt, doesn't seem like a priority. Because you don't compile code that often. In Julia 1.9, packages are fully precompiled to assembly. [It would be an option to change your code to a package/module, but I don't think a module alone will do it, and I think you want to test the actual compilation time, not ways to get around it.]

You could at least update to the latest numbers, as you did in the table above, to the actual readme. I might look into this extra 25% speed, I understand if not a priority for you, not sure it is for me (i.e. for this benchmark).

PallHaraldsson avatar Apr 12 '23 14:04 PallHaraldsson

FYI: I can confirm with JuliaSyntax.jl (it's easy to use, but for the benchmark as is it's needs to be compiled into the sysimage) I get 21% faster.

Possibly you should try to compile the code for other languages too with optimizations on, i.e. -O2 (or -O3?) for fair comparison with Julia on its defaults? It might at least to be able to see two tables, add another for that.

PallHaraldsson avatar Apr 12 '23 16:04 PallHaraldsson

FYI "Add native UTF-8 Validation using fast shift based DFA #47880" was just merged and it seems 20x faster.

I'm not actually sure if the parser uses it, but instead of looking into it, we can see if the parser gets faster in the next nightly. So you may want to wait with publishing new results. [I only see the new parser calls isvalid for individual Char, not Strings, what would you think Dlang does?]

PallHaraldsson avatar Apr 12 '23 21:04 PallHaraldsson

Hi,

I think you have a long benchmark (or so I recall, maybe only after inlining). I think this might be relevant (to test on when merged to master):

https://github.com/JuliaLang/julia/pull/50756

PallHaraldsson avatar Aug 01 '23 22:08 PallHaraldsson

Can you perform the benchmark yourself?

nordlow avatar Aug 02 '23 15:08 nordlow

I can, and did (now that that PR was merged).

I do get 12% improvement over 1.9.2, which is though not the great improvement I was hoping for, nor did the PR help. I.e. I get similar on the beta, where I believe it's not in.

$ juliaup default dev
..
| Lang-uage | Temp-lated | Check Time [us/fn] | Compile Time [us/fn] | Build Time [us/fn] | Run Time [us/fn] | Check RSS [kB/fn] | Build RSS [kB/fn] | Exec Version | Exec Path | 
| :-------: | ---------- | :----------------: | :------------------: | :----------------: | :--------------: | :---------------: | :---------------: | :----------: | :-------: | 
| Julia     | No         | N/A                | N/A                  |  585.9 (1.2x)      | N/A              | N/A               |   31.9 (1.1x)     | 1.11.0-DEV   | julia     | 
| Julia     | Yes        | N/A                | N/A                  |  489.9 (best)      | N/A              | N/A               |   28.8 (best)     | 1.11.0-DEV   | julia     | 

vs. 554.7 on 1.9.2. I also tried all settings for JULIA_INTERPRET_FLAGS and JULIA_COMPILE_FLAGS. I.e. defaults are still much slower, though maybe some improvement there too.

PallHaraldsson avatar Aug 04 '23 19:08 PallHaraldsson