trafficserver icon indicating copy to clipboard operation
trafficserver copied to clipboard

body_factory performance, configuration and errors

Open c-taylor opened this issue 3 years ago • 2 comments

(Tests performed on 9.0.x)

body_factory has several performance and correctness issues when compared to serving similar objects from cache. For example serving a 301 with body is 2.6x - 5.5x faster from cache than from body_factory.

This is inverted from my expectation, where I would perceive small, fabricated responses to be as fast or faster than from cache.

Config example

map https://example.com \
    https://www.example.com \
    @plugin=regex_remap.so @pparam=redirect.config          

redirect.config:
(.*) https://www.example.com$0 @status=301

Observed Issues

I completed several load test and traces and observed the following:

  • Explicit locking limits performance
  • Use of ats_malloc rather than ProxyAllocator
  • Suppressing the body should not suppress the Location header or send empty responses
  • Cannot configure Cache-Control for returned responses

https://github.com/apache/trafficserver/blob/6d4f919b733d47d4e0afc5243e6d863853c38bb6/proxy/http/HttpBodyFactory.cc#L86

// The body factory can be reconfigured dynamically by a manager    //
// callback, so locking is required.  The callback takes a lock,    //
// and the user entry points take a lock.  These locks may limit    //
// the speed of error page generation.  

When performance testing body factory in default mode, the first thing that you notice is the explicit lock/mutex in a perf trace, you can see threads contending on this lock rather than completing useful work.

After 'suppressing' responses, you do see the lock contention disappear, however you run into the second point above. body_factory uses ats_malloc seemingly on a per request basis and so the overall performance is still significantly lower than a cached object, in fact lower than when you don't use suppression in my case! This latter rps drop might also be related to client behaviour when receiving and empty response from the server rather than the expected (in my case) 301. Notably the 301 still appears as expected in the log, but the client never sees it on the wire. See the below curl output...

Default:

> GET / HTTP/1.1
> Host: example.com
> User-Agent: curl/7.70.0
> Accept: */*
>
< HTTP/1.1 301 Redirect
< Date: Thu, 02 Sep 2021 10:07:30 GMT
< Connection: keep-alive
< Via: http/1.1 server.example.com (ApacheTrafficServer/9.0.3)
< Server: ATS/9.0.3
< Cache-Control: no-store
< Location: https://www.example.com/
< Content-Type: text/html
< Content-Language: en
< Content-Length: 304
<
<HTML>
<HEAD>
<TITLE>Document Has Moved</TITLE>
</HEAD>

<BODY BGCOLOR="white" FGCOLOR="black">
<H1>Document Has Moved</H1>
<HR>

<FONT FACE="Helvetica,Arial"><B>
Description: The document you requested has moved to a new location.  The new location is "https://www.example.com/".
</B></FONT>
<HR>
</BODY>
* Connection #0 to host example.com left intact

Suppressed: proxy.config.body_factory.response_suppression_mode INT 1

> GET / HTTP/1.1
> Host: example.com
> User-Agent: curl/7.70.0
> Accept: */*
>
* TLSv1.3 (IN), TLS alert, close notify (256):
* Empty reply from server

Test details

  • High transaction rate test
  • Use persistent connections
  • body_factory: 301 plus body
  • body_factory with suppression: As above but set proxy.config.body_factory.response_suppression_mode INT 1
  • cache: 256B object from cache

Cache: 195,000 rps body_factory: 74,000 rps body_factory (suppression): 35,000 rps <<-- empty responses

Wants

I want:

  • To disable (not suppress) body_factory for specific response codes
  • body disabling/suppression to not send empty responses
  • To disable the feature that requires the lock or remove the need for locking altogether
  • To change the Cache-Control of body_factory curated responses
  • Get body_factory to use ProxyAllocator to reduce allocation and use 'thread local' memory

c-taylor avatar Sep 02 '21 11:09 c-taylor

@zwoop has been complaining about this for years. The internals are terrible, even for ATS. I've done some work on this but hit a blocker I haven't had time to get through. I think I have a plan now, but not sure when I can find the time to do the implementation.

SolidWallOfCode avatar Sep 13 '21 23:09 SolidWallOfCode

This issue has been automatically marked as stale because it has not had recent activity. Marking it stale to flag it for further consideration by the community.

github-actions[bot] avatar Sep 15 '22 02:09 github-actions[bot]