AHK_X11 icon indicating copy to clipboard operation
AHK_X11 copied to clipboard

fiber stack issues with crystal 13

Open phil294 opened this issue 1 year ago • 3 comments

$ bin/ahk_x11.dev <<<'a = 1
$ b := a
$ echo %b%'
[debug] 0: a = 1
[debug] 1: b := a
[debug] 2: echo %b%
[debug] 3: 
[debug] x11: root_win = 1500
[debug] x11: _NET_ACTIVE_WINDOW = 346
[debug] x11: active window detection: 77594635
Failed to raise an exception: END_OF_STACK
[0x66de46] *Exception::CallStack::print_backtrace:Nil +118 in bin/ahk_x11.dev
[0x61a334] __crystal_raise +52 in bin/ahk_x11.dev
[0x62c461] ?? +6472801 in bin/ahk_x11.dev
[0x62c3e7] ?? +6472679 in bin/ahk_x11.dev
[0x76f05f] *Thread#main_fiber:Fiber +63 in bin/ahk_x11.dev
[0x76db52] *Crystal::Scheduler#initialize<Thread>:Deque(Fiber) +34 in bin/ahk_x11.dev
[0x76db21] *Crystal::Scheduler::new<Thread>:Crystal::Scheduler +129 in bin/ahk_x11.dev
[0x76eff5] *Thread#scheduler:Crystal::Scheduler +53 in bin/ahk_x11.dev
[0x76dc0e] *Crystal::Scheduler::event_loop:Crystal::EventLoop+ +14 in bin/ahk_x11.dev
[0x7704c6] *Crystal::EventLoop::current:Crystal::EventLoop+ +6 in bin/ahk_x11.dev
[0x7c3fa6] *File +6 in bin/ahk_x11.dev
[0x7c490a] *File +10 in bin/ahk_x11.dev
[0x7c48fc] *File +28 in bin/ahk_x11.dev
[0x7c3d77] *File +103 in bin/ahk_x11.dev
[0xa4c68a] *Compiler#finalize:Nil +10 in bin/ahk_x11.dev
[0x6454e6] ~proc11Proc(Pointer(Void), Pointer(Void), Nil) +6 in bin/ahk_x11.dev
[0x7fd38c9d70df] GC_invoke_finalizers +79 in /usr/lib/libgc.so.1
[0x7fd38c9d730e] ?? +140546573955854 in /usr/lib/libgc.so.1
[0x7fd38c9e2095] GC_generic_malloc +37 in /usr/lib/libgc.so.1
[0x7fd38c9e23d7] GC_malloc_kind_global +295 in /usr/lib/libgc.so.1
[0x6f8ece] *GC::malloc<UInt64>:Pointer(Void) +62 in bin/ahk_x11.dev
[0x6198de] __crystal_malloc64 +14 in bin/ahk_x11.dev
[0x76ba6c] *Fiber::new<Pointer(Void), Thread>:Fiber +28 in bin/ahk_x11.dev
[0x76f35f] *Thread#start:(Exception+ | Nil) +63 in bin/ahk_x11.dev
[0x63de86] ~procProc(Pointer(Void), Pointer(Void)) +6 in bin/ahk_x11.dev
[0x7fd38c9f098e] ?? +140546574059918 in /usr/lib/libgc.so.1
[0x7fd38c9ee349] GC_call_with_stack_base +41 in /usr/lib/libgc.so.1
[0x7fd38b479ded] ?? +140546551553517 in /usr/lib/libc.so.6
[0x7fd38b4fd0dc] ?? +140546552090844 in /usr/lib/libc.so.6
[0x0] ???

Tried to raise:: Thread#main_fiber cannot be nil (NilAssertionError)
Unable to load dwarf information: Thread#main_fiber cannot be nil (NilAssertionError)
  from bin/ahk_x11.dev in 'Exception::CallStack::unwind:Array(Pointer(Void))'
  from bin/ahk_x11.dev in 'Exception::CallStack#initialize:Array(Pointer(Void))'
  from bin/ahk_x11.dev in 'Exception::CallStack::new:Exception::CallStack'
  from bin/ahk_x11.dev in 'raise<NilAssertionError>:NoReturn'
  from bin/ahk_x11.dev in 'Thread#main_fiber:Fiber'
  from bin/ahk_x11.dev in 'Crystal::Scheduler#initialize<Thread>:Deque(Fiber)'
  from bin/ahk_x11.dev in 'Crystal::Scheduler::new<Thread>:Crystal::Scheduler'
  from bin/ahk_x11.dev in 'Thread#scheduler:Crystal::Scheduler'
  from bin/ahk_x11.dev in 'Crystal::Scheduler::event_loop:Crystal::EventLoop+'
  from bin/ahk_x11.dev in 'Crystal::EventLoop::current:Crystal::EventLoop+'
  from bin/ahk_x11.dev in 'File'
  from bin/ahk_x11.dev in 'File'
  from bin/ahk_x11.dev in 'File'
  from bin/ahk_x11.dev in 'File'
  from bin/ahk_x11.dev in 'Exception::CallStack::read_dwarf_sections<String, UInt64>:(Array(Tuple(UInt64, UInt64, String)) | Nil)'
  from bin/ahk_x11.dev in 'Exception::CallStack::load_debug_info_impl:Nil'
  from bin/ahk_x11.dev in 'Exception::CallStack::load_debug_info:Nil'
  from bin/ahk_x11.dev in 'Exception::CallStack::decode_line_number<UInt64>:Tuple(String, Int32, Int32)'
  from bin/ahk_x11.dev in 'Exception::CallStack#decode_backtrace:Array(String)'
  from bin/ahk_x11.dev in 'Exception::CallStack#printable_backtrace:Array(String)'
  from bin/ahk_x11.dev in 'Exception+'
  from bin/ahk_x11.dev in 'Crystal::System::print_exception<String, Exception+>:Nil'
  from bin/ahk_x11.dev in '__crystal_raise'
  from bin/ahk_x11.dev in '??'
  from bin/ahk_x11.dev in '??'
  from bin/ahk_x11.dev in 'Thread#main_fiber:Fiber'
  from bin/ahk_x11.dev in 'Crystal::Scheduler#initialize<Thread>:Deque(Fiber)'
  from bin/ahk_x11.dev in 'Crystal::Scheduler::new<Thread>:Crystal::Scheduler'
  from bin/ahk_x11.dev in 'Thread#scheduler:Crystal::Scheduler'
  from bin/ahk_x11.dev in 'Crystal::Scheduler::event_loop:Crystal::EventLoop+'
  from bin/ahk_x11.dev in 'Crystal::EventLoop::current:Crystal::EventLoop+'
  from bin/ahk_x11.dev in 'File'
  from bin/ahk_x11.dev in 'File'
  from bin/ahk_x11.dev in 'File'
  from bin/ahk_x11.dev in 'File'
  from bin/ahk_x11.dev in 'Compiler#finalize:Nil'
  from bin/ahk_x11.dev in '~proc11Proc(Pointer(Void), Pointer(Void), Nil)'
  from /usr/lib/libgc.so.1 in 'GC_invoke_finalizers'
  from /usr/lib/libgc.so.1 in '??'
  from /usr/lib/libgc.so.1 in 'GC_generic_malloc'
  from /usr/lib/libgc.so.1 in 'GC_malloc_kind_global'
  from bin/ahk_x11.dev in 'GC::malloc<UInt64>:Pointer(Void)'
  from bin/ahk_x11.dev in '__crystal_malloc64'
  from bin/ahk_x11.dev in 'Fiber::new<Pointer(Void), Thread>:Fiber'
  from bin/ahk_x11.dev in 'Thread#start:(Exception+ | Nil)'
  from bin/ahk_x11.dev in '~procProc(Pointer(Void), Pointer(Void))'
  from /usr/lib/libgc.so.1 in '??'
  from /usr/lib/libgc.so.1 in 'GC_call_with_stack_base'
  from /usr/lib/libc.so.6 in '??'
  from /usr/lib/libc.so.6 in '??'
  from ???
  from bin/ahk_x11.dev in 'Exception::CallStack::unwind:Array(Pointer(Void))'
  from bin/ahk_x11.dev in 'Exception::CallStack#initialize:Array(Pointer(Void))'
  from bin/ahk_x11.dev in 'Exception::CallStack::new:Exception::CallStack'
  from bin/ahk_x11.dev in 'raise<NilAssertionError>:NoReturn'
  from bin/ahk_x11.dev in 'Thread#main_fiber:Fiber'
  from bin/ahk_x11.dev in 'Crystal::Scheduler#initialize<Thread>:Deque(Fiber)'
  from bin/ahk_x11.dev in 'Crystal::Scheduler::new<Thread>:Crystal::Scheduler'
  from bin/ahk_x11.dev in 'Thread#scheduler:Crystal::Scheduler'
  from bin/ahk_x11.dev in 'Crystal::Scheduler::event_loop:Crystal::EventLoop+'
  from bin/ahk_x11.dev in 'Crystal::EventLoop::current:Crystal::EventLoop+'
  from bin/ahk_x11.dev in 'File'
  from bin/ahk_x11.dev in 'File'
  from bin/ahk_x11.dev in 'File'
  from bin/ahk_x11.dev in 'File'
  from bin/ahk_x11.dev in 'Compiler#finalize:Nil'
  from bin/ahk_x11.dev in '~proc11Proc(Pointer(Void), Pointer(Void), Nil)'
  from /usr/lib/libgc.so.1 in 'GC_invoke_finalizers'
  from /usr/lib/libgc.so.1 in '??'
  from /usr/lib/libgc.so.1 in 'GC_generic_malloc'
  from /usr/lib/libgc.so.1 in 'GC_malloc_kind_global'
  from bin/ahk_x11.dev in 'GC::malloc<UInt64>:Pointer(Void)'
  from bin/ahk_x11.dev in '__crystal_malloc64'
  from bin/ahk_x11.dev in 'Fiber::new<Pointer(Void), Thread>:Fiber'
  from bin/ahk_x11.dev in 'Thread#start:(Exception+ | Nil)'
  from bin/ahk_x11.dev in '~procProc(Pointer(Void), Pointer(Void))'
  from /usr/lib/libgc.so.1 in '??'
  from /usr/lib/libgc.so.1 in 'GC_call_with_stack_base'
  from /usr/lib/libc.so.6 in '??'
  from /usr/lib/libc.so.6 in '??'
  from ???

I have no idea what this is and how to fix it, so I'll downgrade to Crystal 1.11 for the time being. Likely related to the fiber run override in hacks.cr

phil294 avatar Aug 04 '24 10:08 phil294

Is this fixable with 1.15? Seems like a lot has changed since https://github.com/crystal-lang/crystal/commit/c3eb0eb62fe7a047516962497f6afabae3fd68a3

donovanglover avatar Feb 17 '25 01:02 donovanglover

Builds for me when removing the custom Fiber code:

https://github.com/phil294/AHK_X11/blob/66eb5208d95f4239822053c7d35f32bc62d57573/src/hacks.cr#L50-L88

Seems like some wait_readable changes broke ./src/run/display/x11.cr in 1.15 which is another issue. Looks like they were internal APIs not meant to be used by users.

donovanglover avatar Feb 20 '25 14:02 donovanglover

Thanks for looking this up. this is unfortunate, but I can't be bothered to chase after a replacement for something that was needlessly removed. If the Crystal team decides to break stuff, I'd rather just fix version 1.11 in place, never upgrade again and be done with it.

phil294 avatar Feb 20 '25 23:02 phil294

Here are some tips to help upgrading Crystal.

First, the IO#wait_readable and IO#wait_writable methods have been moved, the following methods are available:

Crystal::EventLoop.current.wait_readable(io)
Crystal::EventLoop.current.wait_writable(io)

At worst, you could compile with -Devloop=libevent but we make no guarantee that we'll keep libevent support, and the latest Crystal releases just call epoll directly (one less dependency).

Next, overriding Fiber#run requires to copy paste everything while it keeps evolving internally... but _wrapping spawn is simple (don't override it because it keeps evolving, too):

def my_spawn(*, name = nil, &block)
  ::spawn(name: name) do
    block.call
  rescue ex
    if handler = Hacks.fiber_on_unhandled_exception
      handler.call(ex)
    else
      raise ex
    end
  end
end

You can however have a Hacks.spawn or define a spawn method inside your namespace, so any spawn will transparently go through the wrapper.

We thrive to avoid breaking the public API as much as possible. These proposed changes above don't use any internal API (the Crystal::EventLoop interface is standardized) and are forward compatible.

ysbaddaden avatar Jul 22 '25 12:07 ysbaddaden

Hello @ysbaddaden,

thanks, these are some very valuable suggestions! I'm glad we have a wait_readable again now, and wrapping spawn as you suggested seems like a good idea.

I'm not working on this project currently, but will fix it once I do.

...always open for PRs of course, should anyone else feel like it.

phil294 avatar Jul 23 '25 19:07 phil294

So the suggestions have been implemented in #112 and cda4acef6fa91feadb2d7bf1d2f675fe08997d82. I thought that everything is fine but after building it, running ./ahk_x11.AppImage (which should open up the installer) still results in errors Could be because of the appimage wrapper, crystal changes, custom better_spawn override (though I don't think it is?), or anything else. I'm still building on Ubuntu 20.04 for compatibility. I added a better_spawn do raise "x" end and that worked as expected. To make things worse, it only happens in production build (attached) where the stack trace is useless...

Failed to raise an exception: END_OF_STACK
[0x45bd79] ?? +4570489 in /tmp/.mount_ahk_x1U5gY6G/AppRun.wrapped
[0x41f0a8] ?? +4321448 in /tmp/.mount_ahk_x1U5gY6G/AppRun.wrapped
[0x429aab] ?? +4364971 in /tmp/.mount_ahk_x1U5gY6G/AppRun.wrapped
[0x429a73] ?? +4364915 in /tmp/.mount_ahk_x1U5gY6G/AppRun.wrapped
[0x4298b5] ?? +4364469 in /tmp/.mount_ahk_x1U5gY6G/AppRun.wrapped
[0x44e17f] ?? +4514175 in /tmp/.mount_ahk_x1U5gY6G/AppRun.wrapped
[0x5c64ee] GC_invoke_finalizers +158 in /tmp/.mount_ahk_x1U5gY6G/AppRun.wrapped
[0x5c6628] GC_notify_or_invoke_finalizers +152 in /tmp/.mount_ahk_x1U5gY6G/AppRun.wrapped
[0x5c7bc3] GC_generic_malloc +35 in /tmp/.mount_ahk_x1U5gY6G/AppRun.wrapped
[0x5c7e61] GC_malloc_kind_global +225 in /tmp/.mount_ahk_x1U5gY6G/AppRun.wrapped
[0x45b1d1] ?? +4567505 in /tmp/.mount_ahk_x1U5gY6G/AppRun.wrapped
[0x5da031] GC_inner_start_routine +81 in /tmp/.mount_ahk_x1U5gY6G/AppRun.wrapped
[0x5cdca3] GC_call_with_stack_base +19 in /tmp/.mount_ahk_x1U5gY6G/AppRun.wrapped
[0x7fdbf5c969cb] ?? +140582698183115 in /usr/lib/libc.so.6
[0x7fdbf5d1aa0c] ?? +140582698723852 in /usr/lib/libc.so.6
[0x0] ???

Tried to raise:: Thread#main_fiber cannot be nil (NilAssertionError)
Unable to load dwarf information: Thread#main_fiber cannot be nil (NilAssertionError)
  from /tmp/.mount_ahk_x1U5gY6G/AppRun.wrapped in '??'
  from /tmp/.mount_ahk_x1U5gY6G/AppRun.wrapped in '??'
  from /tmp/.mount_ahk_x1U5gY6G/AppRun.wrapped in '??'
  from /tmp/.mount_ahk_x1U5gY6G/AppRun.wrapped in '??'
  from /tmp/.mount_ahk_x1U5gY6G/AppRun.wrapped in '??'
  from /tmp/.mount_ahk_x1U5gY6G/AppRun.wrapped in '??'
  from /tmp/.mount_ahk_x1U5gY6G/AppRun.wrapped in '??'
  from /tmp/.mount_ahk_x1U5gY6G/AppRun.wrapped in '??'
  from /tmp/.mount_ahk_x1U5gY6G/AppRun.wrapped in '??'
  from /tmp/.mount_ahk_x1U5gY6G/AppRun.wrapped in '??'
  from /tmp/.mount_ahk_x1U5gY6G/AppRun.wrapped in '??'
  from /tmp/.mount_ahk_x1U5gY6G/AppRun.wrapped in 'GC_invoke_finalizers'
  from /tmp/.mount_ahk_x1U5gY6G/AppRun.wrapped in 'GC_notify_or_invoke_finalizers'
  from /tmp/.mount_ahk_x1U5gY6G/AppRun.wrapped in 'GC_generic_malloc'
  from /tmp/.mount_ahk_x1U5gY6G/AppRun.wrapped in 'GC_malloc_kind_global'
  from /tmp/.mount_ahk_x1U5gY6G/AppRun.wrapped in '??'
  from /tmp/.mount_ahk_x1U5gY6G/AppRun.wrapped in 'GC_inner_start_routine'
  from /tmp/.mount_ahk_x1U5gY6G/AppRun.wrapped in 'GC_call_with_stack_base'
  from /usr/lib/libc.so.6 in '??'
  from /usr/lib/libc.so.6 in '??'
  from ???
  from /tmp/.mount_ahk_x1U5gY6G/AppRun.wrapped in '??'
  from /tmp/.mount_ahk_x1U5gY6G/AppRun.wrapped in '??'
  from /tmp/.mount_ahk_x1U5gY6G/AppRun.wrapped in '??'
  from /tmp/.mount_ahk_x1U5gY6G/AppRun.wrapped in 'GC_invoke_finalizers'
  from /tmp/.mount_ahk_x1U5gY6G/AppRun.wrapped in 'GC_notify_or_invoke_finalizers'
  from /tmp/.mount_ahk_x1U5gY6G/AppRun.wrapped in 'GC_generic_malloc'
  from /tmp/.mount_ahk_x1U5gY6G/AppRun.wrapped in 'GC_malloc_kind_global'
  from /tmp/.mount_ahk_x1U5gY6G/AppRun.wrapped in '??'
  from /tmp/.mount_ahk_x1U5gY6G/AppRun.wrapped in 'GC_inner_start_routine'
  from /tmp/.mount_ahk_x1U5gY6G/AppRun.wrapped in 'GC_call_with_stack_base'
  from /usr/lib/libc.so.6 in '??'
  from /usr/lib/libc.so.6 in '??'


  from ???

I'm sure there's a good reason for it somewhere but I haven't been able to find a clue yet.

phil294 avatar Sep 04 '25 21:09 phil294

Do you compile with --no-debug for releases? That would explain why there's no dwarf info and thus an unhelpful backtrace.

This is happening in a finalizer, so the culprit is likely one of the #finalize method, and after a quick search I'm pretty sure this is https://github.com/phil294/AHK_X11/blob/f5375887dec3953c4cb3d78271821645bc3840f2/src/compiler.cr#L13-L15.

You can skip this finalizer: the @bin_file will be collected along the Compiler and File#finalize will close it.

ysbaddaden avatar Sep 05 '25 09:09 ysbaddaden

Do you compile with --no-debug for releases? That would explain why there's no dwarf info and thus an unhelpful backtrace.

@ysbaddaden No, just with --release as you should, but this seems to have an implicit --no-debug even though the help text doesn't say so. If I do --debug --release, the symbols aren't present either. Not sure why, having this work would be pretty useful actually.

However, adding --debug to --release actually solved some of my segfaults. I have no idea what is going on. I'll just publish this version now and close the issue.

This is happening in a finalizer, so the culprit is likely one of the #finalize method, and after a quick search I'm pretty sure this is

AHK_X11/src/compiler.cr

No I'm afraid that was not it, as this line isn't called in my tests - only to create stand-alone binaries for end users.

You can skip this finalizer: the @bin_file will be collected along the Compiler and File#finalize will close it.

Nice, I'll apply that now either way!

phil294 avatar Sep 12 '25 17:09 phil294