fast-ruby
fast-ruby copied to clipboard
Parallel assignment is equal to sequential assignment
This one is probably the most confusing benchmark I've ever seen. Original idea was that sequential assignment is faster than parallel (https://speakerdeck.com/sferik/writing-fast-ruby?slide=45). Later @charliesome found (https://github.com/JuanitoFatas/fast-ruby/pull/50) that this benchmark was incorrect because Ruby creates new array with all variables when parallel assignment is last expression in a method. After recalculation parallel assignment became faster than sequential.
Which one is actually faster?
I claim that none of them. Parallel assignment is actually equal to sequential. In the first comment we can read: "In fact, this is even faster than splitting these assignments out over multiple lines because the compiler does not need to emit a per-line trace instruction for each assignment."
I wonder why nobody broaden this topic. "trace" instruction is emitted for every line of your ruby program. Not only for assignments, but for every line. Does it mean we should write our programs as a oneliners? Absolutely not! "trace" has very low overhead. But suppose you are paranoid... Surprise! Ruby VM (MRI) allows you to disable emitting this instructions completely. Here is the trick:
RubyVM::InstructionSequence.compile_option = {
trace_instruction: false
}
Put this one in a file "disable_trace.rb" and load it before loading your program (you can't load this in the same script because it's too late, your program is already compiled to bytecode).
$ ruby -r./disable_trace assignment.rb
Calculating -------------------------------------
Parallel Assignment 149.276k i/100ms
Sequential Assignment
149.712k i/100ms
-------------------------------------------------
Parallel Assignment 7.963M (± 5.2%) i/s - 39.707M
Sequential Assignment
7.947M (± 5.2%) i/s - 39.674M
Comparison:
Parallel Assignment: 7962929.1 i/s
Sequential Assignment: 7946966.7 i/s - 1.00x slower
This is how your bytecode actually looks like when you disable emitting "trace" instruction:
puts RubyVM::InstructionSequence.compile("a = 1;\nb = 2;", nil, nil, nil, trace_instruction: false).disasm
== disasm: #<ISeq:<compiled>@<compiled>>================================
local table (size: 3, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1])
[ 3] a [ 2] b
0000 putobject_OP_INT2FIX_O_1_C_ ( 1)
0001 setlocal_OP__WC__0 3
0003 putobject 2 ( 2)
0005 dup
0006 setlocal_OP__WC__0 2
0008 leave
See? No trace instructions.
Summary:
- Sequential is not slower or faster than parallel, they are equal.
- Ruby adds "trace" instruction for every line of code, not only for assignments.
- Usually when you use multiple assignments you do it in multiple lines. This is why you might think that sequential assignment is slower than parallel. But remember: "Correlation does not imply causation".
- You can disable emitting "trace" instruction if you are paranoid.
- In practice it won't make too much difference if your program does something more seriously.
Also, small digression. It would be better to educate people WHY and WHEN something can be slower/faster rather than making such a strong statements like "X is faster than Y".
Thank you for this! I never considered the impact of the trace
instruction on microbenchmarks, thank you for putting all of this research together.
At this microbenchmarking scale, perhaps we should be disabling trace
instructions more often.
Also, small digression. It would be better to educate people WHY and WHEN something can be slower/faster rather than making such a strong statements like "X is faster than Y".
I agree that this repository could do a better job of this.
I wonder why nobody broaden this topic.
Just a couple of speed freaks working on this in our spare time mate :+1:
After thinking about this some more, I'm not sure why we should disable trace instructions when 99% of production Rubies will have them enabled. Shouldn't we be comparing 1-1 against most people's Ruby production settings? @radarek your thoughts?
I woulnd't disable it too. The point of my comment was that we shouldn't tell people that sequential assignment is slower than parallel. If we would do that then we should tell them also that code in multiple lines is slower than the same code in one line. It would be stupid of course ;-). I think this benchmark should be simply deleted. I don't think any real world code suffers from parallel vs sequential assignment problem or "trace" bytecode instrunction.
One more note about this benchmark. When I did this research I also checked how many local variables are used in "average" method (I used ripper
library to do that). I analyzed all 774 gems from ~/.rbenv/gems/2.2.0/gems/
directory (over 40k files). And here is the histogram:
# K: V (K - number of variables, V - number of methods that use K local variables)
0: 151113
1: 43408
2: 18114
3: 8435
4: 3902
5: 2271
6: 1104
7: 676
8: 399
9: 275
10: 160
11: 115
12: 91
13: 55
14: 22
15: 40
16: 22
17: 15
18: 25
19: 15
20: 4
21: 10
22: 3
23: 4
24: 4
25: 2
26: 4
27: 6
28: 1
29: 4
30: 14
31: 4
35: 1
36: 2
Even if we remove 0
entry (we don't care about method which don't use local variables) we see that 93% of a methods use 4 or less variables. Re-running original benchmark with only 4 local variables I get result:
Comparison:
Parallel Assignment: 6985531.4 i/s
Sequential Assignment: 6720164.3 i/s - 1.04x slower
(for 8 local variables I get "1.12x slower").
If we would do that then we should tell them also that code in multiple lines is slower than the same code in one line. It would be stupid of course ;-).
Right, now I understand.
I think this benchmark should be simply deleted. I don't think any real world code suffers from parallel vs sequential assignment problem or "trace" bytecode instrunction.
That's true. I think you make a good case, the benchmark should be removed.
We need a better policy on what constitutes a big enough difference to matter, especially, as you've pointed out, several of the benches suffer from simply measuring unrelated things like assignment, number of lines, etc.
Given the arguments put forth, I agree this benchmark should be removed. I will submit a PR shortly.
As for what gets into the repo, we also need to make things more uniform so that the benchmarks are more reliable.
It might be a while before that since we're trying to first clean out the repository of old issues.