trafficserver icon indicating copy to clipboard operation
trafficserver copied to clipboard

Abort in HttpSM::attach_server_session after an EOS event

Open bneradt opened this issue 8 months ago • 2 comments

We're seeing the following abort from time to time in our ATS 10 deployed instances:

Fatal: /sd/workspace/src/git.ouryahoo.com/Edge/build/_scm/trafficserver10.0/src/proxy/http/HttpSM.cc:6513: failed assertion `server_entry == nullptr`

The bt:

#0  0x00007f5c68a51acf in raise () from /lib64/libc.so.6
#1  0x00007f5c68a24ea5 in abort () from /lib64/libc.so.6
#2  0x00007f5c6a1870ac in ink_abort (message_format=message_format@entry=0x7f5c6a19abcf "%s:%d: failed assertion `%s`") at src/tscore/ink_error.cc:99
#3  0x00007f5c6a184566 in _ink_assert (expression=expression@entry=0x8e8466 "server_entry == nullptr", file=file@entry=0x8e8a50 "src/proxy/http/HttpSM.cc", line=line@entry=6513) at src/tscore/ink_assert.cc:35
#4  0x0000000000596f00 in HttpSM::attach_server_session (this=this@entry=0x7f4970f0e000) at src/proxy/http/HttpSM.cc:6513
#5  0x0000000000596f7f in HttpSM::create_server_txn (this=this@entry=0x7f4970f0e000, new_session=new_session@entry=0x7f4973540300) at src/proxy/http/HttpSM.cc:1732
#6  0x00000000005a372b in HttpSM::state_http_server_open (this=0x7f4970f0e000, event=1400, data=0x7f4973540300) at src/proxy/http/HttpSM.cc:1790
#7  0x000000000059fc1f in HttpSM::main_handler (data=0x7f4973540300, event=1400, this=0x7f4970f0e000) at src/proxy/http/HttpSM.cc:2628
#8  HttpSM::main_handler (this=0x7f4970f0e000, event=1400, data=0x7f4973540300) at src/proxy/http/HttpSM.cc:2598
#9  0x00000000005eb064 in Continuation::handleEvent (data=0x7f4973540300, event=1400, this=0x7f4970f0e000) at include/iocore/eventsystem/Continuation.h:228
#10 Continuation::handleEvent (data=0x7f4973540300, event=1400, this=0x7f4970f0e000) at include/iocore/eventsystem/Continuation.h:224
#11 ConnectingEntry::state_http_server_open (this=0x7f528ba89e00, event=<optimized out>, data=<optimized out>) at src/proxy/http/ConnectingEntry.cc:106
#12 0x000000000080ce84 in Continuation::handleEvent (data=0x7f4c1649ab58, event=102, this=0x7f528ba89e00) at include/iocore/eventsystem/Continuation.h:228
#13 Continuation::handleEvent (data=0x7f4c1649ab58, event=102, this=0x7f528ba89e00) at include/iocore/eventsystem/Continuation.h:224
#14 read_signal_and_update (event=102, vc=0x7f4c1649a900) at src/iocore/net/UnixNetVConnection.cc:95
#15 0x000000000080f131 in read_signal_done (vc=0x7f4c1649a900, nh=0x7f5c61254590, event=102) at src/iocore/net/UnixNetVConnection.cc:162
#16 UnixNetVConnection::readSignalDone (this=this@entry=0x7f4c1649a900, event=event@entry=102, nh=nh@entry=0x7f5c61254590) at src/iocore/net/UnixNetVConnection.cc:983
#17 0x00000000007d965c in SSLNetVConnection::net_read_io (this=0x7f4c1649a900, nh=0x7f5c61254590, lthread=<optimized out>) at src/iocore/net/SSLNetVConnection.cc:694
#18 0x000000000083b088 in NetHandler::process_ready_list (this=this@entry=0x7f5c61254590) at src/iocore/net/NetHandler.cc:286
#19 0x000000000083b38a in NetHandler::waitForActivity (this=0x7f5c61254590, timeout=<optimized out>) at src/iocore/net/NetHandler.cc:374
#20 0x0000000000876eac in EThread::execute_regular (this=this@entry=0x7f5c61253a00) at include/tsutil/Histogram.h:156
#21 0x0000000000877069 in EThread::execute (this=0x7f5c61253a00) at src/iocore/eventsystem/UnixEThread.cc:348
#22 EThread::execute (this=0x7f5c61253a00) at src/iocore/eventsystem/UnixEThread.cc:326
#23 0x0000000000873df2 in spawn_thread_internal (a=0x7f5c67a81700) at src/iocore/eventsystem/Thread.cc:75
#24 0x00007f5c68dd01ca in start_thread () from /lib64/libpthread.so.0
#25 0x00007f5c68a3ce73 in clone () from /lib64/libc.so.6

Here's the offending server_entry:

(gdb) f 4
#4  0x0000000000596f00 in HttpSM::attach_server_session (this=this@entry=0x7f4970f0e000) at src/proxy/http/HttpSM.cc:6513
6513      hsm_release_assert(server_entry == nullptr);
(gdb) p *server_entry
$4 = {
  vc = 0x7f4d25d1ef90, 
  read_buffer = 0x0, 
  write_buffer = 0x7f528c9c6800, 
  read_vio = 0x7f2b41fd9e58, 
  write_vio = 0x7f2b41fd9ec0, 
  vc_read_handler = (int (HttpSM::*)(HttpSM * const, int, void *)) 0x599ab0 <HttpSM::state_read_server_response_header(int, void*)>, 
  vc_write_handler = (int (HttpSM::*)(HttpSM * const, int, void *)) 0x594ae0 <HttpSM::state_send_server_request_header(int, void*)>, 
  vc_type = HTTP_SERVER_VC, 
  sm = 0x7f4970f0e000, 
  eos = false, 
  in_tunnel = true
}

I'll attach the entire history variable, but here are the interesting parts at the end:

    }, {
      location = {
        file = 0x8e8a50 "/sd/workspace/src/git.ouryahoo.com/Edge/build/_scm/trafficserver10.0/src/proxy/http/HttpSM.cc",
        func = 0x8e8808 "add_to_existing_request",
        line = 2198
      },
      event = 34463,
      reentrancy = 13
    }, {
      location = {
        file = 0x8e8a50 "/sd/workspace/src/git.ouryahoo.com/Edge/build/_scm/trafficserver10.0/src/proxy/http/HttpSM.cc",
        func = 0x8e7e7c "do_cache_lookup_and_read",
        line = 4985
      },
      event = 0,
      reentrancy = 8
    }, {
      location = {
        file = 0x8e8a50 "/sd/workspace/src/git.ouryahoo.com/Edge/build/_scm/trafficserver10.0/src/proxy/http/HttpSM.cc",
        func = 0x8e89fc "state_http_server_open",
        line = 1749
      },
      event = 104,
      reentrancy = 1
...
    }, {
      location = {
        file = 0x8e8a50 "/sd/workspace/src/git.ouryahoo.com/Edge/build/_scm/trafficserver10.0/src/proxy/http/HttpSM.cc",
        func = 0x8e89fc "state_http_server_open",
        line = 1749
      },
      event = 1400,
      reentrancy = 1
    }, {

Notice that add_to_existing_request which attempts to handle connection pooling, followed eventually by state_http_server_open with an event of 104, which is VC_EVENT_EOS. In these situations, we don't seem to handle the pooling logic correctly, and we try to attach a transaction to what is presumably a closed origin connection.

That is, it seems like we follow this progression:

  1. HttpSM::add_to_existing_request: https://github.com/apache/trafficserver/blob/71ff1ec5cecc871d6d0db7163b6a550e5f737c94/src/proxy/http/HttpSM.cc#L2234-L2236
  2. HttpSM::state_http_server_open EOS event: https://github.com/apache/trafficserver/blob/71ff1ec5cecc871d6d0db7163b6a550e5f737c94/src/proxy/http/HttpSM.cc#L1832

bneradt avatar May 12 '25 21:05 bneradt

history.txt

bneradt avatar May 13 '25 16:05 bneradt

Here's an example bt of a previous event from the client triggering an EOS via HttpSM::state_http_server_open that would put it in the state where the HttpSM::attach_server_session as the original bt indicates:

#0  0x00007f9901eb0acf in raise () from /lib64/libc.so.6
#1  0x00007f9901e83ea5 in abort () from /lib64/libc.so.6
#2  0x00007f99036aa44c in ink_abort (message_format=message_format@entry=0x7f99036bdaa7 "%s:%d: failed assertion `%s`") at src/tscore/ink_error.cc:99
#3  0x00007f99036a7906 in _ink_assert (expression=expression@entry=0x8fb4e9 "0", file=file@entry=0x8d77d8 "src/proxy/http/HttpSM.cc", line=line@entry=1829)
    at src/tscore/ink_assert.cc:35
#4  0x000000000059c272 in HttpSM::state_http_server_open (this=0x7f8d147b2000, event=104, data=0x0) at src/proxy/http/HttpSM.cc:1829
#5  0x0000000000597f3f in HttpSM::main_handler (data=0x0, event=104, this=0x7f8d147b2000) at src/proxy/http/HttpSM.cc:2661
#6  HttpSM::main_handler (this=0x7f8d147b2000, event=104, data=0x0) at src/proxy/http/HttpSM.cc:2631
#7  0x000000000064d4f0 in Continuation::handleEvent (data=0x0, event=104, this=0x7f8d147b2000) at include/iocore/eventsystem/Continuation.h:228
#8  Continuation::handleEvent (data=0x0, event=104, this=0x7f8d147b2000) at include/iocore/eventsystem/Continuation.h:224
#9  Http2Stream::initiating_close (this=<optimized out>) at src/proxy/http2/Http2Stream.cc:680
#10 0x000000000063f653 in Http2ConnectionState::rcv_frame (this=this@entry=0x7f8f78857720, frame=frame@entry=0x7f98f0bfd390) at src/proxy/http2/Http2ConnectionState.cc:1496
#11 0x0000000000636ba9 in Http2CommonSession::do_complete_frame_read (this=this@entry=0x7f8f78857718) at src/proxy/http2/Http2CommonSession.cc:338
#12 0x0000000000637014 in Http2CommonSession::do_process_frame_read (this=0x7f8f78857718, vio=0x7f777f3fc1f8, inside_frame=<optimized out>) at src/proxy/http2/Http2CommonSession.cc:400
#13 0x000000000065cd6e in Http2ClientSession::main_event_handler (this=0x7f8f78857400, event=2259, edata=0x7f8cf3a56900) at src/proxy/http2/Http2ClientSession.cc:193
#14 0x000000000086590c in Continuation::handleEvent (data=0x7f8cf3a56900, event=2259, this=<optimized out>) at include/iocore/eventsystem/Continuation.h:228
#15 Continuation::handleEvent (data=0x7f8cf3a56900, event=2259, this=<optimized out>) at include/iocore/eventsystem/Continuation.h:224
#16 EThread::process_event (this=0x7f98fa0cdd40, e=0x7f8cf3a56900, calling_code=2259) at src/iocore/eventsystem/UnixEThread.cc:166
#17 0x000000000086669c in EThread::execute_regular (this=this@entry=0x7f98fa0cdd40) at src/iocore/eventsystem/UnixEThread.cc:273
#18 0x0000000000867199 in EThread::execute (this=0x7f98fa0cdd40) at src/iocore/eventsystem/UnixEThread.cc:358
#19 EThread::execute (this=0x7f98fa0cdd40) at src/iocore/eventsystem/UnixEThread.cc:335
#20 0x0000000000863d32 in spawn_thread_internal (a=0x7f99011ffa40) at src/iocore/eventsystem/Thread.cc:75
#21 0x00007f990222f1ca in start_thread () from /lib64/libpthread.so.0
#22 0x00007f9901e9be73 in clone () from /lib64/libc.so.6

I added an assertion in HttpSM::state_http_server_open to catch this. Note in this case, while opening an origin side connection, the client processed a frame which triggered a stream error. The handling of the error in Http2Stream::initiating_close resulted in an EOS event to the HttpSM in HttpSM::state_http_server_open

bneradt avatar May 21 '25 18:05 bneradt