truffleruby icon indicating copy to clipboard operation
truffleruby copied to clipboard

segv running Rails Simple Benchmark's Rails app

Open headius opened this issue 5 years ago • 8 comments

I am playing with the Rails Simple Benchmark to get some performance information across Ruby impls. Running TruffleRuby 20.2 today I had the server crash with a segv.

The app was set up using the "common" gemfile from https://github.com/noahgibbs/rsb. Using TruffleRuby from GraalVM 20.2, in --jvm mode, one process with 16 threads, hitting just the "static" text rendering endpoint using the "wrk" benchmark tool with 1 thread and 10 connections.

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x0000000115a1f08f, pid=89642, tid=32527
#
# JRE version: OpenJDK Runtime Environment GraalVM CE 20.2.0 (11.0.8+10) (build 11.0.8+10-jvmci-20.2-b03)
# Java VM: OpenJDK 64-Bit Server VM GraalVM CE 20.2.0 (11.0.8+10-jvmci-20.2-b03, mixed mode, sharing, tiered, jvmci, jvmci compiler, compressed oops, g1 gc, bsd-amd64)
# Problematic frame:
# J 18217 jvmci org.graalvm.compiler.truffle.runtime.OptimizedCallTarget.profiledPERoot([Ljava/lang/Object;)Ljava/lang/Object; jdk.internal.vm.compiler (48 bytes) @ 0x0000000115a1f08f [0x0000000115a1f020+0x000000000000006f] (ActiveSupport::Logger#add#2)

The hs_err file is attached. hs_err_pid89642.log

The crash came shortly after leaving the server idle for 15-20 minutes and then resuming the benchmark. Performance dropped to a tenth of what it was, continued to degrade and produce read, write, and timeout errors, and then finally the server crashed.

headius avatar Aug 27 '20 19:08 headius

Another run with a 16-thread server and flags provided by @eregon also appears to have crashed.

Command line for server: RUBYOPT='--jvm --experimental-options --cexts-lock=false --engine.CompilerThreads=-1 --engine.SplittingGrowthLimit=10.0 --engine.SplittingMaxNumberOfSplitNodes=200000000' bundle exec puma -e production -p 4321

Command line for wrk 10s loop: ruby -e 'loop { system "wrk -c 8 -t 1 -d 10s http://localhost:4321/static" }'

headius avatar Aug 27 '20 21:08 headius

I should note that before I killed the server this most recent time, concurrent requests/s had dropped to under 100 and it was using nearly 5GB of memory. CPU usage was pegged at 100% while the benchmark ran.

This is all run on MacOS using the GraalVM 20.2 release.

headius avatar Aug 27 '20 21:08 headius

Additional attempt to run concurrent benchmark against server produced the same result... some dozens of 10s runs work, steadily improving performance, it stabilizes for a little while, and then performance starts to degrade with more errors returned from server.

I also had a thread dump when I tried to quit this server process, which may or may not be related (attached).

thread_dump.txt

headius avatar Aug 27 '20 22:08 headius

So far, I've been unable to reproduce. I used the following steps. Does this look good @headius ? What are the steps for running the wrk loop to reproduce the error: just let it run in a loop till the error occurs or should I start/stop it?

$ rbenv shell truffleruby+graalvm-20.2.0
$ git clone [email protected]:noahgibbs/rsb.git
$ cd rsb/rails_test_app
$ export BUNDLE_GEMFILE=Gemfile.common
$ export RSB_EXTRA_GEMFILES=Gemfile.puma
$ bundle install
$ RUBYOPT='--jvm --experimental-options --cexts-lock=false --engine.CompilerThreads=-1 --engine.SplittingGrowthLimit=10.0 --engine.SplittingMaxNumberOfSplitNodes=200000000' bundle exec puma -e production -p 4321

# Using MRI 2.6.6
$ ruby -e 'loop { system "wrk -c 8 -t 1 -d 10s http://localhost:4321/static" }'

@eregon I did get an error on startup with the following options and truffleruby+graalvm-20.2.0 (but do not get this error on a dev version truffleruby 20.3.0-dev-d42bff90, like ruby 2.6.6, GraalVM CE JVM [x86_64-darwin]:

% RUBYOPT='--jvm' bundle exec puma -e production -p 4321 
*** SIGUSR1 not implemented, signal based restart unavailable!
Puma starting in single mode...
* Version 3.11.4 (truffleruby 20.2.0 - ruby 2.6.6), codename: Love Song
* Min threads: 0, max threads: 16
* Environment: production
BigDecimal.new is deprecated; use BigDecimal() method instead.
* Listening on tcp://0.0.0.0:4321
Use Ctrl-C to stop
2020-08-28 09:52:07 -0500: Rack app error handling request { GET /static }
#<RuntimeError: makeSharedShape() can only be called on non-shared shapes. (UnsupportedOperationException)
	from com.oracle.truffle.object.ShapeImpl.makeSharedShape(ShapeImpl.java:1167)
	from com.oracle.truffle.object.DynamicObjectLibraryImplFactory$MakeSharedNodeGen$Uncached.execute(DynamicObjectLibraryImplFactory.java:366)
	from com.oracle.truffle.object.DynamicObjectLibraryImpl.markShared(DynamicObjectLibraryImpl.java:281)
	from com.oracle.truffle.object.DynamicObjectLibraryImplGen$DynamicObjectLibraryExports$Uncached.markShared(DynamicObjectLibraryImplGen.java:1068)
	from com.oracle.truffle.api.object.DynamicObjectLibraryGen$UncachedDispatch.markShared(DynamicObjectLibraryGen.java:1725)
	from org.truffleruby.language.objects.shared.SharedObjects.share(SharedObjects.java:150)
	from org.truffleruby.language.objects.shared.SharedObjects.shareObjects(SharedObjects.java:100)
	from org.truffleruby.language.objects.shared.SharedObjects.shareInternalFields(SharedObjects.java:165)
	from org.truffleruby.language.objects.shared.ShareInternalFieldsNode.shareUncached(ShareInternalFieldsNode.java:121)
	from org.truffleruby.language.objects.shared.ShareInternalFieldsNodeGen.executeShare(ShareInternalFieldsNodeGen.java:133)
	from org.truffleruby.language.objects.shared.ShareObjectNode.shareCached(ShareObjectNode.java:79)
	from org.truffleruby.language.objects.shared.ShareObjectNodeGen.executeShare(ShareObjectNodeGen.java:50)
	from org.truffleruby.language.objects.shared.WriteBarrierNode.writeBarrierCached(WriteBarrierNode.java:57)
	from org.truffleruby.language.objects.shared.WriteBarrierNodeGen.executeWriteBarrier(WriteBarrierNodeGen.java:64)
	from org.truffleruby.language.objects.WriteObjectFieldNode.writeExistingField(WriteObjectFieldNode.java:74)
	from org.truffleruby.language.objects.WriteObjectFieldNodeGen.executeWithGeneralize(WriteObjectFieldNodeGen.java:56)
	from org.truffleruby.language.objects.WriteObjectFieldNode.write(WriteObjectFieldNode.java:49)
	from org.truffleruby.language.objects.WriteInstanceVariableNode.execute(WriteInstanceVariableNode.java:49)
	from org.truffleruby.language.control.OrLazyValueDefinedNode.execute(OrLazyValueDefinedNode.java:70)
	from org.truffleruby.language.defined.DefinedWrapperNode.execute(DefinedWrapperNode.java:32)
	from org.truffleruby.language.control.SequenceNode.execute(SequenceNode.java:36)
	from org.truffleruby.language.arguments.CheckArityNode.execute(CheckArityNode.java:41)
	from org.truffleruby.language.methods.CatchForMethodNode.execute(CatchForMethodNode.java:42)
	from org.truffleruby.language.methods.ExceptionTranslatingNode.execute(ExceptionTranslatingNode.java:33)
	from org.truffleruby.language.LazyRubyNode.execute(LazyRubyNode.java:47)
	from org.truffleruby.language.RubyRootNode.execute(RubyRootNode.java:60)
	from org.graalvm.compiler.truffle.runtime.OptimizedCallTarget.executeRootNode(OptimizedCallTarget.java:503)
Translated to internal error>
/Users/bfish/.rbenv/versions/truffleruby+graalvm-20.2.0/graalvm/Contents/Home/jre/languages/ruby/lib/gems/gems/railties-4.2.11/lib/rails/engine.rb:514:in `env_config'
/Users/bfish/.rbenv/versions/truffleruby+graalvm-20.2.0/graalvm/Contents/Home/jre/languages/ruby/lib/gems/gems/railties-4.2.11/lib/rails/engine.rb:514:in `call'
/Users/bfish/.rbenv/versions/truffleruby+graalvm-20.2.0/graalvm/Contents/Home/jre/languages/ruby/lib/gems/gems/railties-4.2.11/lib/rails/application.rb:165:in `call'
/Users/bfish/.rbenv/versions/truffleruby+graalvm-20.2.0/graalvm/Contents/Home/jre/languages/ruby/lib/gems/gems/puma-3.11.4/lib/puma/configuration.rb:225:in `call'
/Users/bfish/.rbenv/versions/truffleruby+graalvm-20.2.0/graalvm/Contents/Home/jre/languages/ruby/lib/gems/gems/puma-3.11.4/lib/puma/server.rb:632:in `handle_request'
/Users/bfish/.rbenv/versions/truffleruby+graalvm-20.2.0/graalvm/Contents/Home/jre/languages/ruby/lib/gems/gems/puma-3.11.4/lib/puma/server.rb:446:in `process_client'
/Users/bfish/.rbenv/versions/truffleruby+graalvm-20.2.0/graalvm/Contents/Home/jre/languages/ruby/lib/gems/gems/puma-3.11.4/lib/puma/server.rb:306:in `block in run'
/Users/bfish/.rbenv/versions/truffleruby+graalvm-20.2.0/graalvm/Contents/Home/jre/languages/ruby/lib/gems/gems/puma-3.11.4/lib/puma/thread_pool.rb:120:in `call'
/Users/bfish/.rbenv/versions/truffleruby+graalvm-20.2.0/graalvm/Contents/Home/jre/languages/ruby/lib/gems/gems/puma-3.11.4/lib/puma/thread_pool.rb:120:in `block in spawn_thread'

bjfish avatar Aug 28 '20 14:08 bjfish

@bjfish Yes, that's pretty much all I did... start up the server and hit it with many threads using wrk.

headius avatar Sep 17 '20 20:09 headius

Something that might make a difference is that @headius was running GraalVM on JDK 11 and not JDK 8. Currently we still recommend JDK8 for TruffleRuby because we test it more extensively and JDK11 does not seem to give much. @bjfish Can you try with GraalVM on JDK 11?

eregon avatar Sep 18 '20 09:09 eregon

@headius Could you try with GraalVM JDK 8 instead of 11 and see if it still happens?

eregon avatar Sep 18 '20 09:09 eregon

@bjfish I have a similar experience described here https://github.com/oracle/truffleruby/discussions/2105 and reproduced with a simple Rack app and JVM mode (I use java installed via jt), with Native mode, and with asdf installed versions. It does not depend on RUBYOPT\TRUFFLERUBYOPT (I tried different options)

I believe the crash/degradation depends on:

  1. The number of machine resources
  2. The total number of requests (not simultaneous)

So if you have a hight performance laptop\PC, just increase these numbers: -c 8 -t 1.

The feeling that we are consuming some kind of resource, which eventually ends and degradation occurs. After degradation, the load on the CPU becomes zero, but while warmup (first 1-2 wrk runs), I can warming up my room\myself.

ssnickolay avatar Sep 23 '20 17:09 ssnickolay