trafficserver icon indicating copy to clipboard operation
trafficserver copied to clipboard

Failed assertion in http tunnel

Open mrdg opened this issue 8 months ago • 7 comments

This is on version 9.2.10

Fatal: HttpTunnel.cc:991: failed assertion `c->skip_bytes <= c->buffer_reader->read_avail()`

backtrace

#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x0000152a601e127e in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x0000152a601c48ff in __GI_abort () at ./stdlib/abort.c:79
#5  0x0000152a60e71e1e in ink_abort (message_format=message_format@entry=0x152a60ee7cdb "%s:%d: failed assertion `%s`") at ink_error.cc:99
#6  0x0000152a60e6e04b in _ink_assert (expression=<optimized out>, file=<optimized out>, line=<optimized out>) at ink_assert.cc:37
#7  0x0000560eb7a58d08 in HttpTunnel::producer_run (this=0x560fcfe5db08, p=0x560fcfe5dd10) at HttpTunnel.cc:991
#8  0x0000560eb7a59b40 in HttpTunnel::tunnel_run (this=0x560fcfe5db08, p_arg=0x560fcfe5dd10) at HttpTunnel.cc:818
#9  0x0000560eb79e439e in HttpSM::state_api_callout (this=this@entry=0x560fcfe5cb00, event=event@entry=60000, data=data@entry=0x0) at HttpSM.cc:1648
#10 0x0000560eb79eb989 in HttpSM::state_api_callback (this=this@entry=0x560fcfe5cb00, event=event@entry=60000, data=data@entry=0x0) at HttpSM.cc:1453
#11 0x0000560eb7962dee in TSHttpTxnReenable (txnp=0x560fcfe5cb00, event=TS_EVENT_HTTP_CONTINUE) at traffic_server/InkAPI.cc:6316
#12 0x0000152a540b625d in transform_plugin (contp=0x1528a21f87a0, event=TS_EVENT_HTTP_READ_RESPONSE_HDR, edata=0x560fcfe5cb00) at compress/compress.cc:908
#13 0x0000560eb7946148 in INKContInternal::handle_event (this=0x1528a21f87a0, event=60006, edata=0x560fcfe5cb00) at traffic_server/InkAPI.cc:1135
#14 0x0000560eb795dc6f in Continuation::handleEvent (data=0x560fcfe5cb00, event=60006, this=0x1528a21f87a0) at /trafficserver/iocore/eventsystem/I_Continuation.h:227
#15 Continuation::handleEvent (data=0x560fcfe5cb00, event=60006, this=0x1528a21f87a0) at /trafficserver/iocore/eventsystem/I_Continuation.h:223
#16 APIHook::invoke (this=this@entry=0x152a3839e1c0, event=60006, edata=edata@entry=0x560fcfe5cb00) at traffic_server/InkAPI.cc:1372
#17 0x0000560eb79e3d41 in HttpSM::state_api_callout (this=this@entry=0x560fcfe5cb00, event=event@entry=60000, data=data@entry=0x0) at HttpSM.cc:1576
#18 0x0000560eb79eb989 in HttpSM::state_api_callback (this=this@entry=0x560fcfe5cb00, event=event@entry=60000, data=data@entry=0x0) at HttpSM.cc:1453
#19 0x0000560eb7962dee in TSHttpTxnReenable (txnp=0x560fcfe5cb00, event=TS_EVENT_HTTP_CONTINUE) at traffic_server/InkAPI.cc:6316
#20 0x0000152a54247ef0 in (anonymous namespace)::invokePluginForEvent (event=TS_EVENT_HTTP_READ_RESPONSE_HDR, ats_txn_handle=0x560fcfe5cb00, plugin=0x152791318a10) at utils_internal.cc:160
#21 atscppapi::utils::internal::invokePluginForEvent (plugin=0x152791318a10, ats_txn_handle=0x560fcfe5cb00, event=TS_EVENT_HTTP_READ_RESPONSE_HDR) at utils_internal.cc:251
#22 0x0000152a54244cc9 in (anonymous namespace)::handleTransactionPluginEvents (cont=0x1529f98f0c40, event=TS_EVENT_HTTP_READ_RESPONSE_HDR, edata=0x560fcfe5cb00) at TransactionPlugin.cc:53
#23 0x0000560eb7946148 in INKContInternal::handle_event (this=0x1529f98f0c40, event=60006, edata=0x560fcfe5cb00) at traffic_server/InkAPI.cc:1135
#24 0x0000560eb795dc6f in Continuation::handleEvent (data=0x560fcfe5cb00, event=60006, this=0x1529f98f0c40) at /trafficserver/iocore/eventsystem/I_Continuation.h:227
#25 Continuation::handleEvent (data=0x560fcfe5cb00, event=60006, this=0x1529f98f0c40) at /trafficserver/iocore/eventsystem/I_Continuation.h:223
#26 APIHook::invoke (this=this@entry=0x152a407a84c0, event=60006, edata=edata@entry=0x560fcfe5cb00) at traffic_server/InkAPI.cc:1372
#27 0x0000560eb79e3d41 in HttpSM::state_api_callout (this=this@entry=0x560fcfe5cb00, event=event@entry=60000, data=data@entry=0x0) at HttpSM.cc:1576
#28 0x0000560eb79eb989 in HttpSM::state_api_callback (this=this@entry=0x560fcfe5cb00, event=event@entry=60000, data=data@entry=0x0) at HttpSM.cc:1453
#29 0x0000560eb7962dee in TSHttpTxnReenable (txnp=0x560fcfe5cb00, event=TS_EVENT_HTTP_CONTINUE) at traffic_server/InkAPI.cc:6316
#30 0x0000152a54248163 in (anonymous namespace)::handleTransactionEvents (cont=<optimized out>, event=<optimized out>, edata=0x560fcfe5cb00) at utils_internal.cc:111
#31 0x0000560eb7946148 in INKContInternal::handle_event (this=0x560eb9b38b00, event=60006, edata=0x560fcfe5cb00) at traffic_server/InkAPI.cc:1135
#32 0x0000560eb795dc6f in Continuation::handleEvent (data=0x560fcfe5cb00, event=60006, this=0x560eb9b38b00) at /trafficserver/iocore/eventsystem/I_Continuation.h:227
#33 Continuation::handleEvent (data=0x560fcfe5cb00, event=60006, this=0x560eb9b38b00) at /trafficserver/iocore/eventsystem/I_Continuation.h:223
#34 APIHook::invoke (this=this@entry=0x560eb9b3af40, event=60006, edata=edata@entry=0x560fcfe5cb00) at traffic_server/InkAPI.cc:1372
#35 0x0000560eb79e3d41 in HttpSM::state_api_callout (this=0x560fcfe5cb00, event=<optimized out>, data=<optimized out>) at HttpSM.cc:1576
#36 0x0000560eb79e5581 in HttpSM::state_read_server_response_header (this=0x560fcfe5cb00, event=<optimized out>, data=<optimized out>) at HttpSM.cc:2154
#37 0x0000560eb79eb6c0 in HttpSM::main_handler (this=0x560fcfe5cb00, event=100, data=0x15299407ec30) at HttpSM.cc:2764
#38 0x0000560eb7cc18bc in Continuation::handleEvent (data=0x15299407ec30, event=100, this=0x560fcfe5cb00) at /trafficserver/iocore/eventsystem/I_Continuation.h:227
#39 Continuation::handleEvent (data=0x15299407ec30, event=100, this=0x560fcfe5cb00) at /trafficserver/iocore/eventsystem/I_Continuation.h:223
#40 read_signal_and_update (event=100, vc=0x15299407ea40) at UnixNetVConnection.cc:83
#41 0x0000560eb7ccaed0 in read_from_net (nh=<optimized out>, vc=0x15299407ea40, thread=<optimized out>) at UnixNetVConnection.cc:323
#42 0x0000560eb7cb013c in NetHandler::process_ready_list (this=this@entry=0x152a5d1e8410) at UnixNet.cc:428
#43 0x0000560eb7cb056b in NetHandler::waitForActivity (this=0x152a5d1e8410, timeout=<optimized out>) at UnixNet.cc:565
#44 0x0000560eb7d12d79 in EThread::execute_regular (this=this@entry=0x152a5d1e4010) at UnixEThread.cc:292
#45 0x0000560eb7d12f51 in EThread::execute (this=0x152a5d1e4010) at UnixEThread.cc:353
#46 EThread::execute (this=0x152a5d1e4010) at UnixEThread.cc:331
#47 0x0000560eb7d109da in spawn_thread_internal (a=0x560eb9ad4ad0) at Thread.cc:78
#48 0x0000152a60238aa4 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447
#49 0x0000152a602c5c3c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78

Some notes from poking at the core dump

#9  0x0000560eb79e439e in HttpSM::state_api_callout (this=this@entry=0x560fcfe5cb00, event=event@entry=60000, data=data@entry=0x0) at HttpSM.cc:1648

(gdb) p t_state.hdr_info.response_content_length
$2 = 65

(gdb) p t_state.ranges[0]
$3 = {_start = 0, _end = 64, _done_byte = -1}

(gdb) p t_state.hdr_info.server_response.m_http.u.resp.m_status
$10 = 308

(gdb) p t_state.cache_info.action
$15 = HttpTransact::CACHE_DO_WRITE

(gdb) p t_state.cache_info.write_status
$16 = HttpTransact::CACHE_WRITE_IN_PROGRESS

(gdb) p t_state.cache_info.write_lock_state
$17 = HttpTransact::CACHE_WL_SUCCESS
#7  0x0000560eb7a58d08 in HttpTunnel::producer_run (this=0x560fcfe5db08, p=0x560fcfe5dd10) at HttpTunnel.cc:991

(gdb) p c->skip_bytes
$1 = 316

(gdb) p c->name
$11 = 0x560eb7d56b3b "cache write"

I've looked at a few of these crashes, and they were all range requests like above, with the range matching the content length of the server response. The server response code is always a 308.

mrdg avatar May 15 '25 10:05 mrdg

@mrdg Do you know if you have automatic redirection enabled? (The is_redirect_in_process field of the state machine will tell if one was triggered.) I may have added the redirect label in error - the automatic redirection has so many problems that I just assumed this is another one after seeing the response code is always a permanent redirect.

JosiahWI avatar May 15 '25 18:05 JosiahWI

@JosiahWI I checked a few core dumps and redirect_in_process was false in all of them so it seems to be something else.

mrdg avatar May 16 '25 13:05 mrdg

It might be related to this issue: #11975. What is your configuration for the compress plugin?

bryancall avatar May 19 '25 22:05 bryancall

It might be related to this issue: https://github.com/apache/trafficserver/pull/11975. What is your configuration for the compress plugin?

this is our compress.config

enabled true
remove-accept-encoding false
cache true
flush true

compressible-content-type text/*
compressible-content-type application/*
compressible-content-type image/svg*
compressible-content-type image/x-icon

supported-algorithms br,gzip,deflate

compressible-status-code 200
compressible-status-code 206
compressible-status-code 304
compressible-status-code 404

mrdg avatar May 20 '25 12:05 mrdg

If 206 Partial Content is related, below patches might be helpful, there're some bugs in compress plugin of 9.2.x.

Approach 1).

  • apply https://github.com/apache/trafficserver/pull/12116
  • change compressible-status-code config to
compressible-status-code 200 304 404

or

Approach 2).

  • apply #11975 and #12121

masaori335 avatar Jun 02 '25 23:06 masaori335

If 206 Partial Content is related, below patches might be helpful, there're some bugs in compress plugin of 9.2.x.

Approach 1).

* apply [[compress] Fix compressible-status-code config #12116](https://github.com/apache/trafficserver/pull/12116)

* change `compressible-status-code` config to
compressible-status-code 200 304 404

or

Approach 2).

* apply [[compress] Add range request control option #11975](https://github.com/apache/trafficserver/pull/11975) and [[compress] Add an option to do not compress partial object #12121](https://github.com/apache/trafficserver/pull/12121)

Howdy folks, thank you for the help so far 🙇 we went with approach 1, unfortunately it did not resolve the failing asserts

We are not sure if approach 2 will help here as we are only seeing this for origin responses which has a 308 status code and we do not include 308 in our compressible-status-code, which is currently set to compressible-status-code 200,304,404

is there another approach that you think could help here?

JakeChampion avatar Jun 09 '25 11:06 JakeChampion

In case it is helpful to know, when using gdb and go to the frame for HttpTunnel::producer_run at HttpTunnel.cc:991 we can see that the c->buffer_reader has an empty first block and that the second block contains the full response body:

(gdb) p c->buffer_reader.block.m_ptr._end - c->buffer_reader.block.m_ptr._start
$32 = 0
(gdb) p c->buffer_reader.block.m_ptr.next.m_ptr._end - c->buffer_reader.block.m_ptr.next.m_ptr._start
$33 = 67

c->vc_type is set to HT_CACHE_WRITE and p->chunking_action set to TCA_PASSTHRU_DECHUNKED_CONTENT and sm->t_state.range_setup is set to HttpTransact::RANGE_REQUESTED and p c->name "cache write", which means that the skip_bytes was set by this function call in HttpSM.cc:6316 :

setup_cache_write_transfer(&cache_sm, server_entry->vc, &t_state.cache_info.object_store, client_response_hdr_bytes,
                                 "cache write");

Is the first block meant to contain something like the response headers by any chance?

We think the issue has something to do with the user-agent making the incoming client request, cancelling their request before recieving a response, potentially there is some race condition being hit somewhere in ats

JakeChampion avatar Jun 09 '25 14:06 JakeChampion