Truffleruby loses $! after `rb_exc_raise`
While developing ruby-pg I noticed a bug in Truffleruby. I replaced the pg gem by openssl, to use a stdlib:
require "openssl"
begin
raise StandardError, "aaa"
rescue Exception => err
begin
# this works since it generates an ArgumentError before calling into C-ext:
# OpenSSL::Random.random_bytes
# but this loses the exception context, since it raises ArgumentError from a C-ext:
OpenSSL::Random.random_bytes(-1)
rescue ArgumentError
end
# should raise StandardError "aaa" but raises RuntimeError ""
raise
# this works and raises the StandardError "aaa"
# raise err
end
It looks to me like a call to rb_exc_raise or sibling in a C-extension clear the exception context, so that the second raise in the sample above acts like an ordinary raise outside of the rescue branch.
Context:
$ ruby -v
truffleruby 22.3.1, like ruby 3.0.3, GraalVM CE Native [x86_64-linux]
Thank you for the report.
I had a quick look and I observed $! isn't set just before the raise no-args.
And interestingly it does work fine when using raise ArgumentError, "ruby raise" instead of OpenSSL::Random.random_bytes(-1).
I'll debug it further, I thought it's simply rb_define_method missing an ensure but that doesn't seem to fix it.
This seems caused by this change: https://github.com/oracle/truffleruby/commit/19f0e6af33651e177691a5ecb5a7a8b1dc3e8409
Reverting the changes in cext_ruby.rb make the example in the description work correctly.
So it seems the semantics $! around calls to C extensions are more complicated than what we have currently.
I tried looking in CRuby if/how they clear $! but I couldn't find any special handling of $! for C functions (VM_METHOD_TYPE_CFUNC, vm_call0_body->vm_call0_cfunc_with_frame).
There is some handling of errinfo in rb_vrescue2(), maybe that's related.
I wonder if begin/rescue/end is supposed to behave like save $! on begin and restore it at end.
Right now we do the save/restore only if an exception happens and if it's handled by a rescue clause in TruffleRuby, and so the save is done just before executing the rescue clause, not as soon as the begin.
That might help (but would be some performance overhead).
I found quite a few place doing ->errinfo = Qnil or rb_set_errinfo(Qnil);, it seems usually done at the time of starting to throw an exception. TruffleRuby doesn't set $! on throwing.
The strange behavior is the behavior of this spec which passes on CRuby:
def err
$!
end
it "is cleared when entering a C method" do
$!.should == nil
begin
raise StandardError
rescue
$!.class.should == StandardError
err.class.should == StandardError
@s.rb_errinfo().should == nil
end
end
I don't understand why rb_errinfo() is nil there and what sets it to nil, that's weird.
OTOH rb_raise()->rb_longjmp()->exc_setup_message()->errinfo_place() has some complex logic where it gets the errinfo from a Ruby frame on the stack instead of just reading ec->errinfo. Maybe that's part of the magic why the ArgumentError has the StandardError as cause in the first example. But it still doesn't explain that spec.
@aardvark179 Would you remember where/how CRuby sets rb_errinfo to nil on calls to C methods?
I'm looking at this a bit again, @nirvdrum reported a similar issue. With this script with more debug prints:
require "openssl"
begin
p in_first_begin: $!
raise StandardError, "aaa"
rescue Exception
p in_first_rescue: $!
begin
# this works since it generates an ArgumentError before calling into C-ext:
# OpenSSL::Random.random_bytes
# but this loses the exception context, since it raises ArgumentError from a C-ext:
OpenSSL::Random.random_bytes(-1)
p in_second_begin: $!
rescue ArgumentError
p in_second_rescue: $!
end
p after_second_begin_rescue_end: $!
# should raise StandardError "aaa" but raises RuntimeError ""
raise
# this works and raises the StandardError "aaa"
# raise err
end
We get:
ruby 3.2.2 (2023-03-30 revision e51014f9c0) [x86_64-linux]
{:in_first_begin=>nil}
{:in_first_rescue=>#<StandardError: aaa>}
{:in_second_rescue=>#<ArgumentError: negative string size (or size too big)>}
{:after_second_begin_rescue_end=>#<StandardError: aaa>}
cext_exc.rb:5:in `<main>': aaa (StandardError)
truffleruby 24.0.1, like ruby 3.2.2, Oracle GraalVM Native [x86_64-linux]
{:in_first_begin=>nil}
{:in_first_rescue=>#<StandardError: aaa>}
{:in_second_rescue=>#<ArgumentError: negative string size (or size too big)>}
{:after_second_begin_rescue_end=>nil}
cext_exc.rb:21:in `<main>':
unhandled exception
So $! is nil at after_second_begin_rescue_end, because when exiting the nested rescue we reset $! to the value it had before entering the rescue, which is nil because https://github.com/oracle/truffleruby/blob/1ee958f355db44a95bf961aa626b7f7120d39425/lib/truffle/truffle/cext_ruby.rb#L37
If we just revert https://github.com/oracle/truffleruby/commit/19f0e6af33651e177691a5ecb5a7a8b1dc3e8409 changes in cext_ruby.rb we fail that spec though (OTOH, the rest of :cext specs passes, and jt test mri --cext too, but I think @aardvark179 did this change to fix pg or some other C ext so we should check that).
I filed a CRuby issue for this because it's not clear to me if clearing $! on calling a method defined in C is intended or a CRuby bug: https://bugs.ruby-lang.org/issues/20455
After thinking about this more and what I understood from the CRuby semantics and discussion with Koichi, I think the best fix for now is to use a completely separate storage for rb_errinfo/rb_set_errinfo, i.e. to have a new field in RubyFiber for that. And so $! is completely independent of rb_errinfo, which seems the case in CRuby except when there is no rescue and no ensure on the stack (which seems rare overall and something no one should rely on).
@andrykonchin Could you do that, and add a spec based on the description's example?
BTW, currently we have ThreadLocalGlobals in RubyThread for both $! and $?.
$? is indeed per thread and not per Fiber:
$ ruby -ve 'Fiber.new { `true`; p $? }.resume; p $?'
ruby 3.2.2 (2023-03-30 revision e51014f9c0) [x86_64-linux]
#<Process::Status: pid 89794 exit 0>
#<Process::Status: pid 89794 exit 0>
But $! should be per Fiber:
$ ruby -ve 'Fiber.new { begin; raise "hi"; rescue; p $!; Fiber.yield; p :a; end }.resume; p $!'
ruby 3.2.2 (2023-03-30 revision e51014f9c0) [x86_64-linux]
#<RuntimeError: hi>
nil
So $? should become a direct field of RubyThread, and $? a direct field of RubyFiber.
Neither should be added to getAdjacentObjects since they are only reachable from the current thread/fiber (same as before).
I managed to reproduce a similar error, but using yaml:
require 'yaml'
def function
raise StandardError, "custom message"
rescue StandardError
puts $!.message
data = "yaml: str"
valid_result = YAML.unsafe_load(data)
puts "Valid yaml" if valid_result == {"yaml" => "str"}
raise
end
begin
function
rescue Exception => e
puts "#{e.class}: '#{e.message}'"
puts e.backtrace
end
ruby -v
truffleruby 24.0.1, like ruby 3.2.2, Oracle GraalVM Native [aarch64-darwin]