Introduce mutating variant of -= (~1.22x faster)
Problem
Right now there is no way to perform an array difference operation without also allocating a new array. Array allocation can be expensive.
Why can't we mutate the caller in -=
We cannot mutate the caller because it may cause unexpected behavior if there were other assignments as well. For example:
array2 = array1 -= array3
array2 -= array4
If the -= call mutated, then array1 would be modified on the second line which would be surprising.
Proposed solution
Introduce a new mutating method on array that does not allocate an intermediate array and instead returns itself.
There is already a Array#difference method (which takes multiple arrays while Array#- takes a single array).
I'm currently proposing naming this new method Array#diff! it will mutate the caller, only accept a single array to diff against, and return itself.
The goal is to increase performance of -= like operations.
In these micro benchmarks:
$ cat test.rb
require 'benchmark'
b_array = [2, 2, 2, 2, 2, 5, 5, 5, 5, 5, 99]
n = 5_000_000
Benchmark.bm do |x|
x.report("Array#subtract!") { n.times do; a = [1, 2, 3]; a.subtract!(b_array); end }
x.report("Array#-= ") { n.times do; a = [1, 2, 3]; a -= b_array; end }
end
$ make runruby
generating vm_call_iseq_optimized.inc
vm_call_iseq_optimized.inc unchanged
RUBY_ON_BUG='gdb -x ./.gdbinit -p' ./miniruby -I./lib -I. -I.ext/common ./tool/runruby.rb --extout=.ext -- --disable-gems ./test.rb
user system total real
able-gems ./test.rb
user system total real
Array#subtract! 1.168282 0.002174 1.170456 ( 1.172227)
Array#-= 1.428345 0.004802 1.433147 ( 1.439785)
Array#subtract! is ~1.22x faster than -= in this micro benchmark.
Would it be consistent to call it Array#difference!? By the way looks great!
Thanks for the review @ioquatix !
Regarding the naming suggestion. That's originally what I thought too. The problem I ran into is that Array#difference can accept multiple arrays:
array.difference([1,2], [3,4])
If we were to introduce Array#difference! I think it should also accept the same method signature as the non-bang method or it would be confusing.
When I implemented difference! as a mutating variant it wasn't faster than Array#-=.
I'm not 100% sure why. My best guess is that methods with arbitrary-sized arguments are slower. However, there may have been a mistake in my implementation. Or maybe there's an optimization that I missed.
So that's why diff! is different than difference, because it only accepts a single argument.
I'm certainly open to naming suggestions. How do you feel about Array#subtract! ?
Can you share the branch with your implementation of difference! I can take a look if that’s helpful.
@ioquatix here's the difference! implementation (with benchmarks in the commit message): https://github.com/schneems/ruby/commit/1c8ebb23e8106b3e4d331949d8bab867e219ff71
That's a raw re-write of difference to be mutable.
It's worth noting that concat is also slower than += even though it mutates:
$ cat test.rb
require 'benchmark'
b_array = [2, 2, 2, 2, 2, 5, 5, 5, 5, 5, 99]
n = 5_000_000
Benchmark.bm do |x|
x.report("Array#concat") { n.times do; a = [1, 2, 3]; a.concat(b_array); end }
x.report("Array#+= ") { n.times do; a = [1, 2, 3]; a += b_array ; end }
end
$ make runruby
compiling version.c
generating vm_call_iseq_optimized.inc
vm_call_iseq_optimized.inc unchanged
linking miniruby
/bin/sh ./tool/ifchange "--timestamp=.rbconfig.time" rbconfig.rb rbconfig.tmp
rbconfig.rb unchanged
creating verconf.h
verconf.h updated
compiling loadpath.c
builtin_binary.inc updated
compiling builtin.c
linking static-library libruby.3.1-static.a
linking ruby
RUBY_ON_BUG='gdb -x ./.gdbinit -p' ./miniruby -I./lib -I. -I.ext/common ./tool/runruby.rb --extout=.ext -- --disable-gems ./test.rb
user system total real
Array#concat 0.785914 0.004957 0.790871 ( 0.794673)
Array#+= 0.752269 0.007611 0.759880 ( 0.767160)
If my hypothesis is true that this is because it accepts multiple arguments, I'm wondering if it's possible to specialize these functions somehow because the mutating function should be faster than the allocating function (in theory). I believe that problem may also affect other array methods that accept multiple arrays (beyond concat and difference).
Also FWIW I'm updating this PR to be Array#subtract! in the meantime.
This PR adds a new method. A new feature requires discussion in the bug tracker. Please create a ticket into bugs.ruby-lang.org
Thank you. I have created https://bugs.ruby-lang.org/issues/18395