ice icon indicating copy to clipboard operation
ice copied to clipboard

Ruby Ice/exceptions test failure

Open bernardnormier opened this issue 1 year ago • 3 comments

I am getting this crash in CI (macos release), but I can't reproduce it on my desktop.

[ running client/server with sliced format test - 07/09/24 20:32:03 ]
- Config: tcp
(/Users/runner/work/ice/ice/cpp/test/Ice/exceptions/build/macosx/shared/server --Ice.Default.Host=127.0.0.1 --Test.BasePort=14000 --Ice.Warn.Connections=1 --Ice.Default.Protocol=tcp --Ice.IPv6=0 --Ice.PrintStackTraces=1 --Ice.Default.SlicedFormat=1 --Ice.ThreadPool.Server.Size=1 --Ice.ThreadPool.Server.SizeMax=3 --Ice.ThreadPool.Server.SizeWarn=0 --Ice.PrintAdapterReady=1)
(ruby  /Users/runner/work/ice/ice/ruby/test/TestHelper.rb Client --Ice.Default.Host=127.0.0.1 --Test.BasePort=14000 --Ice.Warn.Connections=1 --Ice.Default.Protocol=tcp --Ice.IPv6=0 --Ice.Default.SlicedFormat=1 env={'DYLD_LIBRARY_PATH': '/Users/runner/work/ice/ice/cpp/lib', 'RUBYLIB': '/Users/runner/work/ice/ice/ruby/ruby:/Users/runner/work/ice/ice/ruby/test/Ice/exceptions'})
testing value factory registration exception... <OBJ_INFO:[email protected]:7072> 0x0000000101440028 [0 M    ] T_NONE 
/Users/runner/work/ice/ice/ruby/test/Ice/exceptions/AllTests.rb:27: [BUG] try to mark T_NONE object
ruby 3.3.3 (2024-06-12 revision f1c7b6f435) [arm64-darwin23]

-- Crash Report log information --------------------------------------------
   See Crash Report log file in one of the following locations:
     * ~/Library/Logs/DiagnosticReports
     * /Library/Logs/DiagnosticReports
   for more details.
Don't forget to include the above Crash Report log file in bug reports.

-- Control frame information -----------------------------------------------
c:0007 p:0056 s:0050 e:000046 METHOD /Users/runner/work/ice/ice/ruby/test/Ice/exceptions/AllTests.rb:27
c:0006 p:0006 s:0034 e:000033 BLOCK  /Users/runner/work/ice/ice/ruby/test/Ice/exceptions/Client.rb:16
c:0005 p:0070 s:0029 e:000028 METHOD /Users/runner/work/ice/ice/ruby/test/TestHelper.rb:69
c:0004 p:0031 s:0020 e:000019 METHOD /Users/runner/work/ice/ice/ruby/test/Ice/exceptions/Client.rb:15
c:0003 p:0042 s:0014 e:000013 METHOD /Users/runner/work/ice/ice/ruby/test/TestHelper.rb:91
c:0002 p:0021 s:0006 e:000005 EVAL   /Users/runner/work/ice/ice/ruby/test/TestHelper.rb:107 [FINISH]
c:0001 p:0000 s:0003 E:001a30 DUMMY  [FINISH]

-- Ruby level backtrace information ----------------------------------------
/Users/runner/work/ice/ice/ruby/test/TestHelper.rb:107:in `<main>'
/Users/runner/work/ice/ice/ruby/test/TestHelper.rb:91:in `run'
/Users/runner/work/ice/ice/ruby/test/Ice/exceptions/Client.rb:15:in `run'
/Users/runner/work/ice/ice/ruby/test/TestHelper.rb:69:in `init'
/Users/runner/work/ice/ice/ruby/test/Ice/exceptions/Client.rb:16:in `block in run'
/Users/runner/work/ice/ice/ruby/test/Ice/exceptions/AllTests.rb:27:in `allTests'

-- Threading information ---------------------------------------------------
Total ractor count: 1
Ruby thread count for this ractor: 1

-- C level backtrace information -------------------------------------------
/opt/homebrew/Cellar/ruby/3.3.3/lib/libruby.3.3.dylib(rb_vm_bugreport+0x344) [0x101999a60]
/opt/homebrew/Cellar/ruby/3.3.3/lib/libruby.3.3.dylib(rb_bug_without_die) [0x1018370fc]
/opt/homebrew/Cellar/ruby/3.3.3/lib/libruby.3.3.dylib(rb_bug) [0x101a8b7b8]
/opt/homebrew/Cellar/ruby/3.3.3/lib/libruby.3.3.dylib(rb_gc_mark_weak) [0x10184e7f4]
/opt/homebrew/Cellar/ruby/3.3.3/lib/libruby.3.3.dylib(rb_gc_mark_vm_stack_values) [0x10184e440]
/opt/homebrew/Cellar/ruby/3.3.3/lib/libruby.3.3.dylib(rb_execution_context_mark) [0x1019877f0]
/opt/homebrew/Cellar/ruby/3.3.3/lib/libruby.3.3.dylib(cont_mark) [0x10181b488]
/opt/homebrew/Cellar/ruby/3.3.3/lib/libruby.3.3.dylib(gc_mark_stacked_objects_all) [0x10185922c]
/opt/homebrew/Cellar/ruby/3.3.3/lib/libruby.3.3.dylib(gc_marks_rest) [0x101858744]
/opt/homebrew/Cellar/ruby/3.3.3/lib/libruby.3.3.dylib(gc_start) [0x10185baa4]
/opt/homebrew/Cellar/ruby/3.3.3/lib/libruby.3.3.dylib(garbage_collect) [0x101850240]
/opt/homebrew/Cellar/ruby/3.3.3/lib/libruby.3.3.dylib(ruby_xcalloc_body) [0x1018536d8]
/opt/homebrew/Cellar/ruby/3.3.3/lib/libruby.3.3.dylib(rb_id_table_insert) [0x10194acb0]
/opt/homebrew/Cellar/ruby/3.3.3/lib/libruby.3.3.dylib(vm_ccs_create) [0x1019899cc]
/opt/homebrew/Cellar/ruby/3.3.3/lib/libruby.3.3.dylib(callable_method_entry_or_negative) [0x10197e75c]
/opt/homebrew/Cellar/ruby/3.3.3/lib/libruby.3.3.dylib(rb_callable_method_entry) [0x10197e7b4]
/opt/homebrew/Cellar/ruby/3.3.3/lib/libruby.3.3.dylib(rb_vm_search_method_slowpath) [0x101974de8]
/opt/homebrew/Cellar/ruby/3.3.3/lib/libruby.3.3.dylib(vm_search_method_slowpath0) [0x101990c40]
/opt/homebrew/Cellar/ruby/3.3.3/lib/libruby.3.3.dylib(vm_exec_core) [0x1019790fc]
/opt/homebrew/Cellar/ruby/3.3.3/lib/libruby.3.3.dylib(rb_vm_exec) [0x1019766e8]
/opt/homebrew/Cellar/ruby/3.3.3/lib/libruby.3.3.dylib(rb_ec_exec_node) [0x10183ebb8]
/opt/homebrew/Cellar/ruby/3.3.3/lib/libruby.3.3.dylib(ruby_run_node) [0x10183ea94]
/opt/homebrew/Cellar/ruby/3.3.3/bin/ruby(main+0x68) [0x100e7ff24]

-- Other runtime information -----------------------------------------------
...

This line 27 in AllTests.rb is supposed to work fine:

    print "testing value factory registration exception... "
    STDOUT.flush
    vf = ValueFactoryI.new
    communicator.getValueFactoryManager().add(vf, "x")
    begin
        communicator.getValueFactoryManager().add(vf, "x")
        test(false)
    rescue Ice::AlreadyRegisteredException
    end
    communicator.getValueFactoryManager().add(vf, "") # <--- that's line 27
    begin
        communicator.getValueFactoryManager().add(vf, "")
        test(false)
    rescue Ice::AlreadyRegisteredException
    end
    puts "ok"

bernardnormier avatar Jul 10 '24 15:07 bernardnormier

what ruby version do you have? and what version is CI?

externl avatar Jul 10 '24 15:07 externl

I have:

% ruby --version
ruby 3.3.1 (2024-04-23 revision c56cd86388) [arm64-darwin23]

bernardnormier avatar Jul 10 '24 15:07 bernardnormier

I checked your PR. CI installed:

==> Downloading https://ghcr.io/v2/homebrew/core/ruby/manifests/3.3.3
==> Fetching ruby
==> Downloading https://ghcr.io/v2/homebrew/core/ruby/blobs/sha256:506cb8e8011c02be15e4c13ad6a0bf32335dc8c3a1da8063d8262b4afbe5bd3b
==> Downloading https://ghcr.io/v2/homebrew/core/node/manifests/22.3.0

externl avatar Jul 10 '24 15:07 externl

I reverted the work around in #3059.

bernardnormier avatar Nov 04 '24 21:11 bernardnormier

I am consistently getting this failure on my PR for refactoring metadata on forward declarations: #3224. However, it's only failing on ubuntu, not macos. It isn't specific to any config, I've observed it on the ubuntu release, ubuntu debug, and ubuntu cross workflows.

It always fails on the same line as this issue says (line 27), but I've seen 3 distinct error messages come out of this:

  • [BUG] Segmentation fault at 0x00007fb3dcdd5914
  • [BUG] try to mark T_NONE object
  • integer overflow: 18446744071562067968 * 16 > 18446744073709551615

The memory address of the segmentation fault is always different (but always ends in d5914).

But the numbers present in the integer overflow message are always the same numbers. Which is interesting. Even more interesting is that:

  • The number on the right side (18446744073709551615) is the maximum value you can fit in an unsigned 64bit integer.
  • If you subtract the two (18446744073709551615 - 18446744071562067968) you get a value of 2147483647. This is the maximum value you can fit in a signed 32bit integer.

So, it sounds pretty clear that somewhere in C++ we have an unsigned long long, which somehow gets screwed up. For some reason it gets multiplied by 16 (or maybe << 4?), causing it overflow past the maximum for unsigned long long. It's worth noting that like Python, Ruby's integers have no maximum size.

Below is a list of all the failures I've managed to reproduce in CI


(ubuntu cross: https://github.com/zeroc-ice/ice/actions/runs/12160907142/job/33914273684#step:6:692)

 [ running client/server with 1.0 encoding test - 12/04/24 14:25:23 ]
- Config: tcp
- Mappings: ruby,cpp
...
/home/runner/work/ice/ice/ruby/test/Ice/exceptions/AllTests.rb:27: [BUG] Segmentation fault at 0x00007fb3dcdd5914

(ubuntu debug: https://github.com/zeroc-ice/ice/actions/runs/12160907142/job/33914273290#step:6:2098)

[ running client/server with 1.0 encoding test - 12/04/24 14:04:11 ]
- Config: ssl,compress,ipv6,serialize,mx
...
<OBJ_INFO:[email protected]:7072> 0x00007f7609c70020 [0 M    ] T_NONE 
/home/runner/work/ice/ice/ruby/test/Ice/exceptions/AllTests.rb:27: [BUG] try to mark T_NONE object

(ubuntu release: https://github.com/zeroc-ice/ice/actions/runs/12160907142/job/33914272454#step:6:566)

[ running client/server with sliced format test - 12/04/24 14:23:52 ]
- Config: wss
...
integer overflow: 18446744071562067968 * 16 > 18446744073709551615


On this run ubuntu debug and ubuntu cross succeeded. The only failure was:

(ubuntu release: https://github.com/zeroc-ice/ice/actions/runs/12152484540/job/33890276235#step:6:309)

[ running client/server with compact format test - 12/04/24 04:16:58 ]
- Config: wss
...
/home/runner/work/ice/ice/ruby/test/Ice/exceptions/AllTests.rb:27: [BUG] Segmentation fault at 0x00007f7d603d5914


On this run ubuntu release succeeded, but the other 2 failed with:

(ubuntu cross: https://github.com/zeroc-ice/ice/actions/runs/12151640506/job/33887936073#step:6:380)

[ running client/server with sliced format test - 12/04/24 02:53:30 ]
- Config: ws,compress,ipv6,serialize,mx
- Mappings: ruby,cpp
...
/home/runner/work/ice/ice/ruby/test/Ice/exceptions/AllTests.rb:27: [BUG] Segmentation fault at 0x00007f6a749d5914

(ubuntu debug: https://github.com/zeroc-ice/ice/actions/runs/12151640506/job/33887935836#step:6:131)

[ running client/server with compact format test - 12/04/24 03:10:02 ]
- Config: ssl,compress,ipv6,serialize,mx
...
integer overflow: 18446744071562067968 * 16 > 18446744073709551615


(ubuntu release: https://github.com/zeroc-ice/ice/actions/runs/12162495021/job/33922806968?pr=3230#step:6:493)

 [ running client/server with sliced format test - 12/04/24 15:51:35 ]
- Config: tcp
...
/home/runner/work/ice/ice/ruby/test/Ice/exceptions/AllTests.rb:27: [BUG] Segmentation fault at 0x00007fcadb3d5914

(ubuntu debug: https://github.com/zeroc-ice/ice/actions/runs/12162495021/job/33922807665?pr=3230#step:6:685)

 [ running client/server with sliced format test - 12/04/24 16:11:14 ]
- Config: wss
...
/home/runner/work/ice/ice/ruby/test/Ice/exceptions/AllTests.rb:27: [BUG] Segmentation fault at 0x00007fb5c9fd5914

InsertCreativityHere avatar Dec 04 '24 15:12 InsertCreativityHere

Another failure on Ubuntu

https://github.com/zeroc-ice/ice/actions/runs/13902224012/job/38896578775?pr=3705

pepone avatar Mar 17 '25 15:03 pepone

Looks like it's failing consistently now. The recent ones seem to be limited to cross testing.

https://github.com/zeroc-ice/ice/actions/runs/13903645677/job/38901334106?pr=3706

externl avatar Mar 17 '25 17:03 externl