HyperFastCgi icon indicating copy to clipboard operation
HyperFastCgi copied to clipboard

Getting SIGABRTs

Open derFunk opened this issue 8 years ago • 3 comments

Hi, We're using supervisord which keeps mono-server-hyperfastcgi4 running, using the most current revision of HFC and Mono 4.2.1. Our log shows that HFC crashed from time to time due to SIGABRT, "exit status 1" and SIGSEV.

That's the log:

2016-01-20 19:12:27,622 INFO exited: mono-server-hyperfastcgi4 (terminated by SIGABRT (core dumped); not expected)

2016-01-20 19:12:28,624 INFO spawned: 'mono-server-hyperfastcgi4' with pid 3581
2016-01-20 19:12:30,459 INFO success: mono-server-hyperfastcgi4 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2016-01-20 19:12:38,284 INFO exited: mono-server-hyperfastcgi4 (terminated by SIGABRT (core dumped); not expected)

2016-01-20 19:12:39,285 INFO spawned: 'mono-server-hyperfastcgi4' with pid 3662
2016-01-20 19:12:40,478 INFO success: mono-server-hyperfastcgi4 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2016-01-20 21:14:26,362 INFO exited: mono-server-hyperfastcgi4 (terminated by SIGABRT (core dumped); not expected)

2016-01-20 21:14:27,365 INFO spawned: 'mono-server-hyperfastcgi4' with pid 3874
2016-01-20 21:14:29,171 INFO success: mono-server-hyperfastcgi4 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2016-01-21 02:09:13,084 INFO exited: mono-server-hyperfastcgi4 (exit status 1; not expected)

2016-01-21 02:09:14,087 INFO spawned: 'mono-server-hyperfastcgi4' with pid 4259
2016-01-21 02:09:15,599 INFO success: mono-server-hyperfastcgi4 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2016-01-21 03:41:36,285 INFO exited: mono-server-hyperfastcgi4 (terminated by SIGABRT (core dumped); not expected)

2016-01-21 14:38:16,069 INFO spawned: 'mono-server-hyperfastcgi4' with pid 5281
2016-01-21 14:38:17,867 INFO success: mono-server-hyperfastcgi4 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2016-01-21 16:15:30,754 INFO exited: mono-server-hyperfastcgi4 (terminated by SIGSEGV (core dumped); not expected)

2016-01-21 16:15:31,856 INFO spawned: 'mono-server-hyperfastcgi4' with pid 5498

Currently I'm trying to attach gdb to catch the SIGABRT signal ((gdb) catch signal SIGABRT) to hopefully get a stacktrace. For that I have to generate debug symbols for the libnative.

Any hints what the reason of this could be, or how to find out what the reason is? Any help is appreciated, even if it is generic debugging help.

Thanks!

derFunk avatar Jan 26 '16 08:01 derFunk

I'm seeing this in the console logs of HFC:

This happens obviously on Startup in the static NativeTransport constructor when calling RegisterIcall.

I think this is not related to the SIGABRT (Because SIGSEV) but since I'm trying to find out the reasons of all crashes we're experiencing I'm keeping the log here:

...
libev.c:317: cmd_error():       Remote host disconnected from fd 14.
libev.c:317: cmd_error():       Remote host disconnected from fd 18.
[2016-01-26 01:39:15Z] Debug   HyperFastCgi
[2016-01-26 01:39:15Z] Debug   Register native transport
[2016-01-26 01:39:15Z] Debug   Configured host in domain d95d475a, id=1
Stacktrace:


Native stacktrace:

        /usr/bin/mono() [0x49cf0c]
        /usr/bin/mono() [0x4f2d5e]
        /usr/bin/mono() [0x4249dd]
        /lib/x86_64-linux-gnu/libpthread.so.0(+0xf8d0) [0x7f17fe0c68d0]
        /usr/bin/mono() [0x5f5aab]
        /usr/bin/mono() [0x5f67aa]
        /usr/bin/mono() [0x5ecb03]
        /usr/bin/mono() [0x5dbf76]
        /usr/bin/mono() [0x5e380d]
        /usr/bin/mono() [0x5f99f8]
        /usr/bin/mono() [0x5e5f63]
        /usr/bin/mono() [0x5e805f]
        /usr/bin/mono() [0x5db625]
        /usr/bin/mono() [0x5db92b]
        /usr/bin/mono() [0x5c9c49]
        [0x40136ef3]

Debug info from gdb:

host-list.c:38: register_host():        (null):-1:/:/usr/aspnet/ host=0x7f17fd058d40 pinned_host=0x7f17fd058d40 domain=0x11ec570
libev.c:461: Listen():  libevent version: 2.0.21-stable
libev.c:475: Listen():  libevent is using epoll for events.

=================================================================
Got a SIGSEGV while executing native code. This usually indicates
a fatal error in the mono runtime or one of the native libraries
used by your application.
=================================================================

host-list.c:38: register_host():        (null):-1:/:/usr/aspnet/ host=0x7f17fd058d40 pinned_host=0x7f17fd058d40 domain=0x11ec570
libev.c:461: Listen():  libevent version: 2.0.21-stable
libev.c:475: Listen():  libevent is using epoll for events.
[2016-01-26 04:29:06Z] Debug   HyperFastCgi
[2016-01-26 04:29:06Z] Debug   Register native transport
[2016-01-26 04:29:06Z] Debug   Configured host in domain d95d475a, id=1

Unhandled Exception:
System.NullReferenceException: Object reference not set to an instance of an object
  at System.Threading.WaitHandle.WaitOne (Int32 millisecondsTimeout, Boolean exitContext) <0x41513160 + 0x000dc> in <filename unknown>:0
  at System.Threading.WaitHandle.WaitOne (Int32 millisecondsTimeout) <0x41513130 + 0x0001a> in <filename unknown>:0
  at System.Threading.Timer+Scheduler.SchedulerThread () <0x415120f0 + 0x00bd2> in <filename unknown>:0
  at System.Threading.ThreadHelper.ThreadStart_Context (System.Object state) <0x41511fe0 + 0x00083> in <filename unknown>:0
  at System.Threading.ExecutionContext.RunInternal (System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, System.Object state, Boolean preserveSyncCtx) <0x41511b70 + 0x0021a> in <filename unknown>:0
  at System.Threading.ExecutionContext.Run (System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, System.Object state, Boolean preserveSyncCtx) <0x41511b30 + 0x00028> in <filename unknown>:0
  at System.Threading.ExecutionContext.Run (System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, System.Object state) <0x41511a40 + 0x00061> in <filename unknown>:0
  at System.Threading.ThreadHelper.ThreadStart () <0x415119b0 + 0x00034> in <filename unknown>:0
[ERROR] FATAL UNHANDLED EXCEPTION: System.NullReferenceException: Object reference not set to an instance of an object
 at System.Threading.ThreadHelper.ThreadStart () <0x415119b0 + 0x00034> in <filename unknown>:0
[ERROR] FATAL UNHANDLED EXCEPTION: System.NullReferenceException: Object reference not set to an instance of an object
  at System.Threading.WaitHandle.WaitOne (Int32 millisecondsTimeout, Boolean exitContext) <0x41513160 + 0x000dc> in <filename unknown>:0
  at System.Threading.WaitHandle.WaitOne (Int32 millisecondsTimeout) <0x41513130 + 0x0001a> in <filename unknown>:0
  at System.Threading.Timer+Scheduler.SchedulerThread () <0x415120f0 + 0x00bd2> in <filename unknown>:0
  at System.Threading.ThreadHelper.ThreadStart_Context (System.Object state) <0x41511fe0 + 0x00083> in <filename unknown>:0
  at System.Threading.ExecutionContext.RunInternal (System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, System.Object state, Boolean preserveSyncCtx) <0x41511b70 + 0x0021a> in <filename unknown>:0
  at System.Threading.ExecutionContext.Run (System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, System.Object state, Boolean preserveSyncCtx) <0x41511b30 + 0x00028> in <filename unknown>:0
  at System.Threading.ExecutionContext.Run (System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, System.Object state) <0x41511a40 + 0x00061> in <filename unknown>:0
  at System.Threading.ThreadHelper.ThreadStart () <0x415119b0 + 0x00034> in <filename unknown>:0
host-list.c:38: register_host():        (null):-1:/:/usr/aspnet/ host=0x7fa5e0059d40 pinned_host=0x7fa5e0059d40 domain=0x12205a0
libev.c:461: Listen():  libevent version: 2.0.21-stable
libev.c:475: Listen():  libevent is using epoll for events.
[2016-01-26 05:53:38Z] Debug   HyperFastCgi
...

Also the line libev.c:317: cmd_error(): Remote host disconnected from fd 21. appears very often.

derFunk avatar Jan 26 '16 10:01 derFunk

Using the new Mono 4.2.2.30 seems promising. We got higher performance and no more crashes. I'll update you when we have collected more information and experience with 4.2.2.30

derFunk avatar Feb 17 '16 14:02 derFunk

We have intensively tested our server with hfc yesterday by using load/stress tests.

When using the managed Listener on Mono 4.2.1, we used to achieve 130 rps and 5% failed responses due to crashes/timeouts in average.

When using the native listener on Mono 4.2.1 we used to achieve around 85% of failed responses, mostly due to sigabrt.

Wenn using the managed Listener on Mono 4.2.2 we achieved failed responses in the 0-0.5% field, and 130 rps in average. CPU was >100% with the managed Listener of course. No sigabrt, just Timeouts because of cpu limits.

Using the native listener with 4.2.2 gave us double rps compared to the managed one, while CPU usage was at 20-25%. The error rate was 0-0.5%. No crashes, the errors were minor.

We tested with and without involving I/O in the request handling, both were fine.

We're now rolling this out on WIP projects to monitor the stability.

derFunk avatar Feb 18 '16 08:02 derFunk