truffleruby icon indicating copy to clipboard operation
truffleruby copied to clipboard

Raytracer benchmark is 34x slower than with MRI

Open vlazar opened this issue 4 years ago • 7 comments

I was playing with https://github.com/edin/raytracer benchmarks to see how different versions of Ruby perform as well as Crystal https://www.ruby-forum.com/t/ruby-raytracer-in-crystal/261230/3

I wonder if that has something to do with my environment or is there something specific to TruffleRuby/Raytracer benchmark which makes it so slow?

The results are for 2012 MacBook running MacOS Catalina and I'm using using asfd to install different Ruby versions.

% sysctl -n machdep.cpu.brand_string
Intel(R) Core(TM) i7-3615QM CPU @ 2.30GHz

MRI

% ruby --version
ruby 2.7.1p83 (2020-03-31 revision a0c7c23c9c) [x86_64-darwin19]
% ruby RayTracer.rb
Completed in 28840.049 ms

TruffleRuby

% ruby --version
truffleruby 21.1.0, like ruby 2.7.2, GraalVM CE Native [x86_64-darwin]
% ruby RayTracer.rb
Completed in 983755.151 ms

vlazar avatar Apr 26 '21 08:04 vlazar

Thanks very much for opening the issue. I can reproduce.

This is a performance bug in String#[]=, called from ImageRuby::RubyBitmapModl#set_pixel.

% bundle exec ruby --cpusampler --cpusampler.Output=calltree --cpusampler.Delay=10000 --cpusampler.Mode=roots RayTracer.rb > truffleruby.txt
Completed in 217357.88700000002 ms
-----------------------------------------------------------------------------------------------------------------------------------------------------
Sampling CallTree. Recorded 191260 samples with period 1ms.
  Self Time: Time spent on the top of the stack.
  Total Time: Time spent somewhere on the stack.
  Opt %: Percent of time spent in compiled and therefore non-interpreted code.
-----------------------------------------------------------------------------------------------------------------------------------------------------
 Thread: Thread[main,5,main]
 Name                                                          |      Total Time     |  Opt % ||       Self Time     |  Opt % | Location             
-----------------------------------------------------------------------------------------------------------------------------------------------------
 <main>                                                        |     191260ms 100.0% |   0.0% ||          4ms   0.0% |   0.0% | RayTracer.rb~1-296:0-7635 
  RayTracer#render                                             |     191213ms 100.0% |   0.0% ||          0ms   0.0% |   0.0% | RayTracer.rb~259-265:6230-6603 
   Integer#times                                               |     191213ms 100.0% |   0.0% ||          5ms   0.0% |   0.0% | resource:/truffleruby/core/integer.rb~143-152:4359-4505 
    block in RayTracer#render                                  |     191208ms 100.0% |  11.9% ||          0ms   0.0% |   0.0% | RayTracer.rb~260-261:6284-6343 
     block (2 levels) in RayTracer#render                      |     191128ms  99.9% |  99.2% ||        134ms   0.1% |  79.1% | RayTracer.rb~261-264:6314-6589 
      ImageRuby::RubyBitmapModl#set_pixel                      |     189089ms  98.9% |  99.4% ||         76ms   0.0% |  38.2% | .vendor/truffleruby/2.7.2.1/gems/imageruby-0.2.5/lib/imageruby/bitmap/rbbitmap.rb~65-79:1772-2131 
       String#[]=                                              |     188969ms  98.8% |  99.6% ||         38ms   0.0% |  52.6% | resource:/truffleruby/core/string.rb~1033-1063:27359-28062 
        String#assign_index                                    |     188931ms  98.8% |  99.5% ||     188919ms  98.8% |  99.5% | resource:/truffleruby/core/string.rb~1065-1097:28065-28977 

I hacked around this.

diff --git a/ruby/RayTracer.rb b/ruby/RayTracer.rb
index 687e668..d49e972 100644
--- a/ruby/RayTracer.rb
+++ b/ruby/RayTracer.rb
@@ -257,11 +257,13 @@ class RayTracer
   end
 
   def render(scene, image, screenWidth, screenHeight)
+    pixels = []
     screenHeight.times do |y|
       screenWidth.times do |x|
         color = self.traceRay(Ray.new(scene.camera().pos, self.getPoint(x, y, screenWidth, screenHeight, scene.camera())), scene, 0)
         r,g,b = Color.toDrawingColor(color)
-        image.set_pixel(x,y, ImageRuby::Color.from_rgba(r,g,b, 255))
+        # image.set_pixel(x,y, ImageRuby::Color.from_rgba(r,g,b, 255))
+        pixels.push ImageRuby::Color.from_rgba(r,g,b, 255)
   end end end
 end

This fixed it and it's now a lot faster.

% bundle exec ruby --cpusampler --cpusampler.Output=calltree --cpusampler.Delay=10000 --cpusampler.Mode=roots RayTracer.rb > truffleruby2.txt
Completed in 7088.765 ms
-----------------------------------------------------------------------------------------------------------------------------------------------------
Sampling CallTree. Recorded 6023 samples with period 1ms.
  Self Time: Time spent on the top of the stack.
  Total Time: Time spent somewhere on the stack.
  Opt %: Percent of time spent in compiled and therefore non-interpreted code.
-----------------------------------------------------------------------------------------------------------------------------------------------------
 Thread: Thread[main,5,main]
 Name                                                          |      Total Time     |  Opt % ||       Self Time     |  Opt % | Location             
-----------------------------------------------------------------------------------------------------------------------------------------------------
 <main>                                                        |       6023ms 100.0% |   0.0% ||          7ms   0.1% |   0.0% | RayTracer.rb~1-298:0-7712 
  RayTracer#render                                             |       5892ms  97.8% |   0.0% ||          0ms   0.0% |   0.0% | RayTracer.rb~259-267:6230-6680 
   Integer#times                                               |       5892ms  97.8% |   0.0% ||          2ms   0.0% |   0.0% | resource:/truffleruby/core/integer.rb~143-152:4359-4505 
    block in RayTracer#render                                  |       5890ms  97.8% |   3.7% ||          2ms   0.0% |   0.0% | RayTracer.rb~261-262:6300-6359 
     block (2 levels) in RayTracer#render                      |       5868ms  97.4% |  77.6% ||        182ms   3.0% |  28.0% | RayTracer.rb~262-266:6330-6666 
      RayTracer#traceRay                                       |       4820ms  80.0% |  76.9% ||         48ms   0.8% |  25.0% | RayTracer.rb~211-214:4125-4276 
       RayTracer#shade                                         |       4243ms  70.4% |  82.7% ||         94ms   1.6% |  59.6% | RayTracer.rb~216-224:4279-4831 
        RayTracer#getNaturalColor                              |       2780ms  46.2% |  79.2% ||         61ms   1.0% |  78.7% | RayTracer.rb~230-234:5001-5196 
         block in RayTracer#getNaturalColor                    |       2719ms  45.1% |  87.1% ||        144ms   2.4% |  88.2% | RayTracer.rb~232:5083-5180 
          RayTracer#addLight                                   |       2575ms  42.8% |  13.9% ||       1513ms  25.1% |   6.9% | RayTracer.rb~236-251:5199-5897 
           RayTracer#testRay                                   |        658ms  10.9% |  65.0% ||         49ms   0.8% |  55.1% | RayTracer.rb~206-209:4025-4122 
            RayTracer#intersections                            |        609ms  10.1% |  59.8% ||        156ms   2.6% |  61.5% | RayTracer.rb~193-204:3735-4022 
chrisseaton@Chriss-MacBook-Pro ruby % ruby -v
ruby 2.7.2p137 (2020-10-01 revision 5445e04352) [x86_64-darwin20]
chrisseaton@Chriss-MacBook-Pro ruby % bundle exec ruby RayTracer.rb
Completed in 15496.454 ms
chrisseaton@Chriss-MacBook-Pro ruby % ruby -v
truffleruby 21.2.0-dev-ca823dac, like ruby 2.7.2, GraalVM CE JVM [x86_64-darwin]
chrisseaton@Chriss-MacBook-Pro ruby % bundle exec ruby RayTracer.rb
Completed in 5714.599 ms

Intuitively I can understand what this performance problem is, but I'll dig into it a bit more and hope to provide a deep-dive answer and fix.

chrisseaton avatar Apr 26 '21 09:04 chrisseaton

Mmh, so the image is represented as a long String and heavily mutated by String#[]= (used like a buffer)? https://github.com/tario/imageruby/blob/ba38c3e4a5c8c068566501f0e73357fd219535e4/lib/imageruby/bitmap/rbbitmap.rb#L65 Also that code could probably use setbyte for clarity and efficiency. One thing that would be worth trying is to use a native String there, that should handle being used as a buffer much better.

It seems a strange design honestly, and it wouldn't even be possible in languages with immutable strings. I would think an Array of Integer, like used in OptCarrot and many other places is way more efficient, both for CRuby and TruffleRuby. Also for a benchmark it seems best to avoid dependencies anyway, maybe we should open an issue/PR on that benchmark repo?

@chrisseaton Could you share the time for CRuby 2.7.2 without your change? I guess it's actually slower than with your change, and so it seems clear that a String image representation is just inefficient for all Ruby implementations.

eregon avatar Apr 26 '21 10:04 eregon

Without the change:

Completed in 17894.449 ms

At some point TruffleRuby needs to find a way to automatically figure out a string is being used as a buffer and optimise for it.

chrisseaton avatar Apr 26 '21 10:04 chrisseaton

So that's 1.15x (17894ms -> 15496ms) faster on CRuby when using an intuitive representation (Array) vs a String used like a buffer. So even on CRuby it seems problematic to use a String as a buffer / use String#[]=. But the effect is much worse on TruffleRuby. On the upside it makes it really easy to see the problem in the benchmark here.

automatically figure out a string is being used as a buffer and optimise for it.

Yes, that could be nice, probably the first []= or setbyte should migrate to a buffer-like representation (whether it's native or managed). A managed buffer-like String representation seems a high effort, so I would say it's not worth it just for this benchmark. It seems well-written Ruby code doesn't hit this issue often, so I'm not sure how much of an issue it is for actual production code.

FWIW the Crystal version seems to use an array (of RGBA structs), so https://www.ruby-forum.com/t/ruby-raytracer-in-crystal/261230 is just making an unfair comparison, and the only way to fix that part is to fix the Ruby version of the benchmark.

eregon avatar Apr 26 '21 10:04 eregon

We actually have a similar issue with setbyte: https://github.com/oracle/truffleruby/issues/2044#issuecomment-668524452 And some discussion there on possible fixes.

eregon avatar Apr 26 '21 11:04 eregon

FWIW the Crystal version seems to use an array (of RGBA structs), so https://www.ruby-forum.com/t/ruby-raytracer-in-crystal/261230 is just making an unfair comparison, and the only way to fix that part is to fix the Ruby version of the benchmark.

@eregon this looks like a WIP to replace imagery gem for Ruby implementation of benchmark https://github.com/edin/raytracer/blob/master/ruby/Image.rb#L77

vlazar avatar Apr 27 '21 10:04 vlazar

I've got some preliminary fixes in https://github.com/oracle/truffleruby/pull/2341.

I'm then going to focus on making this benchmark faster.

require 'benchmark/ips'

puts RUBY_DESCRIPTION

image = '0' * 10_000

Benchmark.ips do |x|
  x.report('alloc') do
    image = '0' * 10_000
    image.size.times do |n|
      image[n] = (n % 100).chr
    end
  end

  x.report('no-alloc') do
    image.size.times do |n|
      image[n] = (n % 100).chr
    end
  end
end
ruby 2.7.2p137 (2020-10-01 revision 5445e04352) [x86_64-darwin20]
Warming up --------------------------------------
               alloc    48.000  i/100ms
            no-alloc    47.000  i/100ms
Calculating -------------------------------------
               alloc    510.498  (± 4.3%) i/s -      2.592k in   5.088010s
            no-alloc    535.121  (± 1.3%) i/s -      2.679k in   5.007249s

truffleruby 21.2.0-dev-13f2d44f, like ruby 2.7.2, GraalVM CE JVM [x86_64-darwin]
Warming up --------------------------------------
               alloc     5.000  i/100ms
            no-alloc     5.000  i/100ms
Calculating -------------------------------------
               alloc    104.189  (±15.4%) i/s -    500.000  in   5.008380s
            no-alloc    106.172  (±14.1%) i/s -    510.000  in   4.998794s

TruffleRuby does actually do some work to try to flatten big concat-trees that works well for this case... but it still always allocates a new node for each write which is not what we want.

chrisseaton avatar May 03 '21 02:05 chrisseaton