trafficserver icon indicating copy to clipboard operation
trafficserver copied to clipboard

Intermittent crash with ssntxnorder_verify.so (AuTest openclose_h2)

Open masaori335 opened this issue 4 years ago • 9 comments

The current master branch intermittently crashes with ssntxnorder_verify.so which is a plugin for the openclose_h2 AuTest.

backtrace

(lldb) bt
* thread #3, name = '[ET_NET 1]', stop reason = signal SIGABRT
  * frame #0: 0x00007fff204d3946 libsystem_kernel.dylib`__pthread_kill + 10
    frame #1: 0x00007fff20502615 libsystem_pthread.dylib`pthread_kill + 263
    frame #2: 0x00007fff20457411 libsystem_c.dylib`abort + 120
    frame #3: 0x00000001011f12de libtscore.10.dylib`ink_abort(message_format="%s:%d: failed assertion `%s`") at ink_error.cc:99:3
    frame #4: 0x00000001011ed58c libtscore.10.dylib`::_ink_assert(expression="false", file="ProxySession.cc", line=129) at ink_assert.cc:37:3
    frame #5: 0x00000001003940e4 traffic_server`ProxySession::state_api_callout(this=0x0000000106810800, event=104, data=0x000000017a01fff0) at ProxySession.cc:129:5
    frame #6: 0x000000010000664e traffic_server`Continuation::handleEvent(this=0x0000000106810800, event=104, data=0x000000017a01fff0) at I_Continuation.h:219:12
    frame #7: 0x000000010043263e traffic_server`read_signal_and_update(event=104, vc=0x000000017a01fe00) at UnixNetVConnection.cc:83:24
    frame #8: 0x000000010043254d traffic_server`read_signal_done(event=104, nh=0x000000011814c140, vc=0x000000017a01fe00) at UnixNetVConnection.cc:150:7
    frame #9: 0x0000000100432513 traffic_server`UnixNetVConnection::readSignalDone(this=0x000000017a01fe00, event=104, nh=0x000000011814c140) at UnixNetVConnection.cc:1037:11
    frame #10: 0x00000001003cd1cf traffic_server`SSLNetVConnection::net_read_io(this=0x000000017a01fe00, nh=0x000000011814c140, lthread=0x0000000118148000) at SSLNetVConnection.cc:734:5
    frame #11: 0x000000010041f67d traffic_server`NetHandler::process_ready_list(this=0x000000011814c140) at UnixNet.cc:415:11
    frame #12: 0x000000010041fdc5 traffic_server`NetHandler::waitForActivity(this=0x000000011814c140, timeout=60000000) at UnixNet.cc:546:3
    frame #13: 0x0000000100464ddb traffic_server`EThread::execute_regular(this=0x0000000118148000) at UnixEThread.cc:303:14
    frame #14: 0x0000000100465225 traffic_server`EThread::execute(this=0x0000000118148000) at UnixEThread.cc:364:11
    frame #15: 0x0000000100463520 traffic_server`spawn_thread_internal(a=0x0000000101c15e70) at Thread.cc:92:12
    frame #16: 0x00007fff20502954 libsystem_pthread.dylib`_pthread_start + 224
    frame #17: 0x00007fff204fe4a7 libsystem_pthread.dylib`thread_start + 15

client

h2load -n 16 -m 1 -c 16 -t 8 https://127.0.0.1:4443/

ATS

plugin.config

ssntxnorder_verify.so

masaori335 avatar May 24 '21 07:05 masaori335

I'm not sure this is related or not, but this do_io_read() call breaks the relation of Http2ClientSession and Http2Stream.

* thread #3, name = '[ET_NET 1]', stop reason = step over
    frame #0: 0x00000001000d49b2 traffic_server`HttpSM::state_add_to_list(this=0x000000010780dc00, event=0, (null)=0x0000000000000000) at HttpSM.cc:472:28
   469        // Preserve the current read cont and mutex
   470        NetVConnection *netvc = ((ProxyTransaction *)ua_entry->vc)->get_netvc();
   471        ink_assert(netvc != nullptr);
-> 472        ua_entry->read_vio = ua_entry->vc->do_io_read(netvc->read_vio_cont(), 0, nullptr);
   473      }
   474      return EVENT_CONT;
   475    }
Target 0: (traffic_server) stopped.
(lldb) p ua_entry->vc
(Http2Stream *) $0 = 0x0000000107808e00
(lldb) p netvc->read_vio_cont()
(Http2ClientSession *) $1 = 0x0000000107809e00

masaori335 avatar May 24 '21 07:05 masaori335

#7586 ?

maskit avatar May 24 '21 07:05 maskit

Thanks, I haven't noticed that. It's fixing the odd do_io_read() call at least. I'll try the change and review it.

masaori335 avatar May 24 '21 07:05 masaori335

Unfortunately, #7586 still crashes 😢

masaori335 avatar May 24 '21 07:05 masaori335

I'm a bit confused. It is crashing the openclose_h2 autest? Or if you include the ssntxnorder_verify.so test plugin and run h2load against it it will crash eventually? I assume the later and the compiler optimizations have eaten the frame headers specifically associated with ssntxnorder_verify.so

shinrich avatar May 24 '21 16:05 shinrich

Later. I got crash when I run h2load against ATS with the ssntxnorder_verify.so plugin (debug build).

It looks like the Http2ClientSession got EOS on ProxySession::state_api_callout, due to the _vc->do_io_close() is not called before Http2ClientSession::destroy() is called in some path.

masaori335 avatar May 24 '21 23:05 masaori335

Making sure that disabling read/write of VC at Http2ClientSession::destroy() could be a workaround fix. I confirmed below change can fix the crash. But it looks like this is one of the issues of the closing path is complicated. I'd like to cover this case by #7845.

@@ -72,6 +72,11 @@ Http2ClientSession::Http2ClientSession() : super() {}
 void
 Http2ClientSession::destroy()
 {
+  if (_vc) {
+    _vc->do_io_close();
+    _vc = nullptr;
+  }
+

masaori335 avatar May 25 '21 04:05 masaori335

The workaround is something similar to what I tried to do on #7595, which couldn't pass some of autests. I agree that a complete fix would be something like #7845.

maskit avatar May 25 '21 07:05 maskit

This issue has been automatically marked as stale because it has not had recent activity. Marking it stale to flag it for further consideration by the community.

github-actions[bot] avatar May 26 '22 02:05 github-actions[bot]