sentry-php icon indicating copy to clipboard operation
sentry-php copied to clipboard

Sentry ratelimiting increases application response time to at least 3.5 seconds

Open blackwolf12333 opened this issue 3 years ago • 20 comments

Environment

PHP 7.4.13 Sentry SaaS (sentry.io)

We are using an in-house framework, so no laravel or symphony and we don't use promises in the application itself (not sure if that's relevant but I thought I'd mention it).

Which SDK and version?

$ composer show | grep sentry

sentry/sdk                           3.1.0     This is a metapackage shipping sentry/sentry with a recommended HTTP client.
sentry/sentry                        3.1.5     A PHP SDK for Sentry (http://sentry.io)

Steps to Reproduce

For debugging purposes we wrote a small python script that responds with a http 429 code to emulate being rate limited by sentry locally. We then made sure that in the HttpTransport we pointed to the local server that returns a 429 instead of the sentry store api endpoint. From there we were able to debug what happens when sentry rate limits our servers. Please let me know if you think we did something wrong in this debugging setup.

While stepping through the code we found that when sentry sends a 429 indicating we are rate limited the sentry SDK with the recommended HTTP client will retry the request 3 times. Each time it does a usleep() with a time that comes from RetryPlugin::defaultDelay in vendor/php-http/client-common/src/Plugin/RetryPlugin.php. This call to usleep causes the application to wait for a total of 3.5 seconds doing absolutely nothing, including not sending finishing a response while sentry is doing it's thing.

#!/usr/bin/env python
import string,cgi,time, datetime
from http.server import BaseHTTPRequestHandler, HTTPServer

class MyHandler(BaseHTTPRequestHandler):
  def do_GET(self):
    self.send_response(503) # let bots know whats up
    self.send_header('Content-type','text/html')
    self.send_header('Retry-After','0')
    self.end_headers()
    self.send_error(429)

class TimeoutingHTTPServer(HTTPServer):
  def finish_request(self, request, client_address):
    request.settimeout(5) # Really short timeout as there is only 1 thread
    HTTPServer.finish_request(self, request, client_address)

def main():
  try:
    server = TimeoutingHTTPServer(('localhost',83), MyHandler)
    server.serve_forever()
  except KeyboardInterrupt:
    server.socket.close()

if __name__ == '__main__':
  main()

Expected Result

We expected sentry to handle the sending of events on a shutdown handler, as is indicated in the doc comment on HttpTransport: This transport sends the events using a syncronous HTTP client that will delay sending of the requests until the shutdown of the application.

Actual Result

Last time we encountered this rate limiting our application slowed down significantly because each backend request took at least 3.5 seconds to complete.

blackwolf12333 avatar Mar 30 '21 07:03 blackwolf12333

You can disable the retrying of events by setting the send_attempts to 0

ste93cry avatar Apr 02 '21 20:04 ste93cry

@ste93cry Thanks that does seem to help. But we still have the remaining question whether the default behavior makes sense for PHP, given that it means adding 3.5 seconds on top of the normal response time of a PHP backend.

send_attempts is also not documented according to a colleague of mine, so I suppose that should be addressed as well.

blackwolf12333 avatar Apr 06 '21 11:04 blackwolf12333

we still have the remaining question whether the default behavior makes sense for PHP

In the past version 2.x, the sending of the events was delayed until the shutdown of the application, which means that when you use PHP-FPM, it should likely happen after the response has been sent back to the client. Now, since sending the events is a syncronous operation, it definitely makes less sense. However, changing the default without releasing a new major version is a change of behaviour that would be unexpected and therefore cannot be done.

send_attempts is also not documented according to a colleague of mine, so I suppose that should be addressed as well.

Indeed, an issue tracking this should be opened in getsentry/sentry-docs: I know that in these times the guys at Sentry are working to improve the documentation of both the PHP and the Symfony SDK, so it's a good time to tell them that this needs to be addressed as well.

ste93cry avatar Apr 07 '21 17:04 ste93cry

This issue has gone three weeks without activity. In another week, I will close it.

But! If you comment or otherwise update it, I will reset the clock, and if you label it Status: Backlog or Status: In Progress, I will leave it alone ... forever!


"A weed is but an unloved flower." ― Ella Wheeler Wilcox 🥀

github-actions[bot] avatar Jan 12 '22 21:01 github-actions[bot]

As said, changing the default is not feasible. Doc issue reported as https://github.com/getsentry/sentry-docs/issues/4579

Jean85 avatar Jan 12 '22 23:01 Jean85

A 429 should not be retried, at least not using the normal retries that should be reserved for 5xx errors or connection errors.

See also: https://develop.sentry.dev/sdk/rate-limiting/.

I will make sure send_attempts gets documented but I don't think the SDK is handling the 429 correctly either :)

stayallive avatar Jan 14 '22 09:01 stayallive

I will make sure send_attempts gets documented but I don't think the SDK is handling the 429 correctly either :)

No we don't, you're right! Opened #1279 for that.

Jean85 avatar Jan 15 '22 15:01 Jean85

@blackwolf12333 I think your Python script is incorrect...

I tried it with this one:

import string,cgi,time, datetime
from http.server import BaseHTTPRequestHandler, HTTPServer

class HttpHandler(BaseHTTPRequestHandler):
  def do_POST(self):
    self.send_response(429, 'The response body')
    self.end_headers()

class TimeoutingHTTPServer(HTTPServer):
  def finish_request(self, request, client_address):
    request.settimeout(5)
    HTTPServer.finish_request(self, request, client_address)

def main():
  try:
    server = TimeoutingHTTPServer(('localhost', 8888), HttpHandler)
    server.serve_forever()
  except KeyboardInterrupt:
    server.socket.close()

if __name__ == '__main__':
  main()

Run it using python3 yourfilename.py and tested using the following PHP code:

<?php

require __DIR__ . '/vendor/autoload.php';

Sentry\init([
    'dsn' => 'http://[email protected]:8888/sentry/1',
]);

Sentry\captureMessage('Test!');

This results in a single request. If the status code is changed to 500 (in send_response) it does do 4 requests (1 original and 3 retries).

BUT, only if I remove the ErrorPlugin from our HTTP client...

So something fishy IS happening but it rather retries too little instead of too much from the looks of it...

@blackwolf12333 which HTTP client are you using? I have tested this with php-http/curl-client.

stayallive avatar Jan 15 '22 16:01 stayallive

Hmm, I'm not sure what the difference is between send_error and send_response in python to be honest. I think we are using guzzlehttp/guzzle, but I'll double check tomorrow.

blackwolf12333 avatar Jan 16 '22 20:01 blackwolf12333

The biggest problem is that you listened for GET requests, Sentry uses POST. So maybe the HTTP handler code was fine but for the wrong HTTP 😅 The code I pasted should do the correct thing as far as I can tell.

stayallive avatar Jan 16 '22 20:01 stayallive

@stayallive I can reproduce the issue in our project with your python server. But I'll be honest it is a bit of a legacy code base and I think we have 3 separate ways of doing http requests, so it's a bit tricky to figure out which one is actually in use. I'll try to debug and get back to you.

blackwolf12333 avatar Jan 17 '22 08:01 blackwolf12333

Looks like the sentry sdk sets up the RetryPlugin here: https://github.com/getsentry/sentry-php/blob/master/src/HttpClient/HttpClientFactory.php#L150

We use the HttplugClient which is wrapping a symphony CurlHttpClient. This is all set up by the sentry-sdk itself, we have not set a custom http client for sentry to use.

blackwolf12333 avatar Jan 17 '22 08:01 blackwolf12333

We are still using version 3.1.5.

blackwolf12333 avatar Jan 17 '22 08:01 blackwolf12333

Okay, as far as I can tell there is no special handling for 429 errors that is different from 500+ errors in the RetryPlugin as set up by the sentry sdk. The RetryPlugin uses an exceptionDecider which is configured with the default decider with just returns true. So all requests will be retried no matter what http error code. If you change the order in which the plugins are initialized, first the ErrorPlugin and then the RetryPlugin the retry plugin is never used on a 429.

blackwolf12333 avatar Jan 17 '22 08:01 blackwolf12333

I hope @Jean85 and maybe @ste93cry even more can jump in on this since the RetryPlugin only retries 500-599 HTTP errors according to their default options but maybe something together with the ErrorPlugin is causing issues. I am not familiar with the http plug plugins so I hope the ones that brought it in have some insight to share here about what is happening 😄

@blackwolf12333, an obvious thing maybe to try is to be on the latest version, just in case that matters!

stayallive avatar Jan 17 '22 09:01 stayallive

@stayallive you kept insisting that the RetryPlugin only retries on 500-599 errors while when I was debugging it most certainly did not, but I realized that we might not be using a recent version of php-http/client-common and it turns out we indeed don't. We still have version 1.10.0 in our vendor folder. On this version the RetryPlugin does just retry everything no matter the error code.

Depending on how much effort you guys want to put into supporting legacy applications I think you can safely close this issue as it indeed does not happen on projects with recent versions of all their dependencies.

We are already using the configuration option to set the retry attempts to 0 so at the moment this is not impacting us anymore. I'm wondering if it would be useful to document this behavior when using php-http/client-common on 1.x versions?

blackwolf12333 avatar Jan 17 '22 09:01 blackwolf12333

It looks like maybe the version constraints in your composer.json are not strict enough? This bug definitely happens with client-common ^1.5.

"php-http/client-common": "^1.5|^2.0",

blackwolf12333 avatar Jan 17 '22 09:01 blackwolf12333

Besides updating (and maybe requiring a minimum version of the package to avoid such issues?), I think it would be more appropriate also to handle 429 limits well, as I remember that Sentry has a different rate limit per event category. I will try to find the time to implement proper rate limiting handling in the next weeks so that we also stay coherent with the other SDKs

ste93cry avatar Jan 17 '22 09:01 ste93cry

It's all in the changelog - https://github.com/php-http/client-common/blob/master/CHANGELOG.md - but interesting how wildly different RetryPlugin is on different versions.

mfb avatar Jan 17 '22 10:01 mfb

With current version of the Retry Plugin, I confirmed that there is no try for 429 response (testing w/ raw webserver response code, not a real Sentry). And there is a retry for 500 error or DNS failure. If you have a totally broken system where DNS lookups fail after 5 seconds (Sentry\HttpClient\HttpClientFactory::DEFAULT_HTTP_TIMEOUT), then you'll have (1 send + 3 resend attempts) * 5 second DNS timeouts + RetryPlugin sleeping for 0.5, 1 and 2 seconds = 23.5 seconds total waiting for Sentry.

mfb avatar Jan 17 '22 11:01 mfb

I think this issue could be closed, as the situation should be dramatically improved. The SDK will no longer retry failed send attempts, and there is now a configurable idle timeout, which defaults to 2 seconds.

mfb avatar Aug 23 '22 21:08 mfb