sentry-php
sentry-php copied to clipboard
Sentry ratelimiting increases application response time to at least 3.5 seconds
Environment
PHP 7.4.13 Sentry SaaS (sentry.io)
We are using an in-house framework, so no laravel or symphony and we don't use promises in the application itself (not sure if that's relevant but I thought I'd mention it).
Which SDK and version?
$ composer show | grep sentry
sentry/sdk 3.1.0 This is a metapackage shipping sentry/sentry with a recommended HTTP client.
sentry/sentry 3.1.5 A PHP SDK for Sentry (http://sentry.io)
Steps to Reproduce
For debugging purposes we wrote a small python script that responds with a http 429 code to emulate being rate limited by sentry locally. We then made sure that in the HttpTransport
we pointed to the local server that returns a 429 instead of the sentry store api endpoint. From there we were able to debug what happens when sentry rate limits our servers. Please let me know if you think we did something wrong in this debugging setup.
While stepping through the code we found that when sentry sends a 429 indicating we are rate limited the sentry SDK with the recommended HTTP client will retry the request 3 times. Each time it does a usleep()
with a time that comes from RetryPlugin::defaultDelay
in vendor/php-http/client-common/src/Plugin/RetryPlugin.php
. This call to usleep
causes the application to wait for a total of 3.5 seconds doing absolutely nothing, including not sending finishing a response while sentry is doing it's thing.
#!/usr/bin/env python
import string,cgi,time, datetime
from http.server import BaseHTTPRequestHandler, HTTPServer
class MyHandler(BaseHTTPRequestHandler):
def do_GET(self):
self.send_response(503) # let bots know whats up
self.send_header('Content-type','text/html')
self.send_header('Retry-After','0')
self.end_headers()
self.send_error(429)
class TimeoutingHTTPServer(HTTPServer):
def finish_request(self, request, client_address):
request.settimeout(5) # Really short timeout as there is only 1 thread
HTTPServer.finish_request(self, request, client_address)
def main():
try:
server = TimeoutingHTTPServer(('localhost',83), MyHandler)
server.serve_forever()
except KeyboardInterrupt:
server.socket.close()
if __name__ == '__main__':
main()
Expected Result
We expected sentry to handle the sending of events on a shutdown handler, as is indicated in the doc comment on HttpTransport
: This transport sends the events using a syncronous HTTP client that will delay sending of the requests until the shutdown of the application.
Actual Result
Last time we encountered this rate limiting our application slowed down significantly because each backend request took at least 3.5 seconds to complete.
You can disable the retrying of events by setting the send_attempts
to 0
@ste93cry Thanks that does seem to help. But we still have the remaining question whether the default behavior makes sense for PHP, given that it means adding 3.5 seconds on top of the normal response time of a PHP backend.
send_attempts
is also not documented according to a colleague of mine, so I suppose that should be addressed as well.
we still have the remaining question whether the default behavior makes sense for PHP
In the past version 2.x
, the sending of the events was delayed until the shutdown of the application, which means that when you use PHP-FPM, it should likely happen after the response has been sent back to the client. Now, since sending the events is a syncronous operation, it definitely makes less sense. However, changing the default without releasing a new major version is a change of behaviour that would be unexpected and therefore cannot be done.
send_attempts
is also not documented according to a colleague of mine, so I suppose that should be addressed as well.
Indeed, an issue tracking this should be opened in getsentry/sentry-docs
: I know that in these times the guys at Sentry are working to improve the documentation of both the PHP and the Symfony SDK, so it's a good time to tell them that this needs to be addressed as well.
This issue has gone three weeks without activity. In another week, I will close it.
But! If you comment or otherwise update it, I will reset the clock, and if you label it Status: Backlog
or Status: In Progress
, I will leave it alone ... forever!
"A weed is but an unloved flower." ― Ella Wheeler Wilcox 🥀
As said, changing the default is not feasible. Doc issue reported as https://github.com/getsentry/sentry-docs/issues/4579
A 429 should not be retried, at least not using the normal retries that should be reserved for 5xx errors or connection errors.
See also: https://develop.sentry.dev/sdk/rate-limiting/.
I will make sure send_attempts
gets documented but I don't think the SDK is handling the 429 correctly either :)
I will make sure send_attempts gets documented but I don't think the SDK is handling the 429 correctly either :)
No we don't, you're right! Opened #1279 for that.
@blackwolf12333 I think your Python script is incorrect...
I tried it with this one:
import string,cgi,time, datetime
from http.server import BaseHTTPRequestHandler, HTTPServer
class HttpHandler(BaseHTTPRequestHandler):
def do_POST(self):
self.send_response(429, 'The response body')
self.end_headers()
class TimeoutingHTTPServer(HTTPServer):
def finish_request(self, request, client_address):
request.settimeout(5)
HTTPServer.finish_request(self, request, client_address)
def main():
try:
server = TimeoutingHTTPServer(('localhost', 8888), HttpHandler)
server.serve_forever()
except KeyboardInterrupt:
server.socket.close()
if __name__ == '__main__':
main()
Run it using python3 yourfilename.py
and tested using the following PHP code:
<?php
require __DIR__ . '/vendor/autoload.php';
Sentry\init([
'dsn' => 'http://[email protected]:8888/sentry/1',
]);
Sentry\captureMessage('Test!');
This results in a single request. If the status code is changed to 500
(in send_response
) it does do 4 requests (1 original and 3 retries).
BUT, only if I remove the ErrorPlugin
from our HTTP client...
So something fishy IS happening but it rather retries too little instead of too much from the looks of it...
@blackwolf12333 which HTTP client are you using? I have tested this with php-http/curl-client
.
Hmm, I'm not sure what the difference is between send_error
and send_response
in python to be honest. I think we are using guzzlehttp/guzzle
, but I'll double check tomorrow.
The biggest problem is that you listened for GET requests, Sentry uses POST. So maybe the HTTP handler code was fine but for the wrong HTTP 😅 The code I pasted should do the correct thing as far as I can tell.
@stayallive I can reproduce the issue in our project with your python server. But I'll be honest it is a bit of a legacy code base and I think we have 3 separate ways of doing http requests, so it's a bit tricky to figure out which one is actually in use. I'll try to debug and get back to you.
Looks like the sentry sdk sets up the RetryPlugin
here: https://github.com/getsentry/sentry-php/blob/master/src/HttpClient/HttpClientFactory.php#L150
We use the HttplugClient which is wrapping a symphony CurlHttpClient
. This is all set up by the sentry-sdk itself, we have not set a custom http client for sentry to use.
We are still using version 3.1.5.
Okay, as far as I can tell there is no special handling for 429 errors that is different from 500+ errors in the RetryPlugin
as set up by the sentry sdk. The RetryPlugin
uses an exceptionDecider
which is configured with the default decider with just returns true
. So all requests will be retried no matter what http error code. If you change the order in which the plugins are initialized, first the ErrorPlugin
and then the RetryPlugin
the retry plugin is never used on a 429.
I hope @Jean85 and maybe @ste93cry even more can jump in on this since the RetryPlugin
only retries 500-599 HTTP errors according to their default options but maybe something together with the ErrorPlugin
is causing issues. I am not familiar with the http plug plugins so I hope the ones that brought it in have some insight to share here about what is happening 😄
@blackwolf12333, an obvious thing maybe to try is to be on the latest version, just in case that matters!
@stayallive you kept insisting that the RetryPlugin
only retries on 500-599 errors while when I was debugging it most certainly did not, but I realized that we might not be using a recent version of php-http/client-common
and it turns out we indeed don't. We still have version 1.10.0 in our vendor folder. On this version the RetryPlugin
does just retry everything no matter the error code.
Depending on how much effort you guys want to put into supporting legacy applications I think you can safely close this issue as it indeed does not happen on projects with recent versions of all their dependencies.
We are already using the configuration option to set the retry attempts to 0 so at the moment this is not impacting us anymore. I'm wondering if it would be useful to document this behavior when using php-http/client-common
on 1.x versions?
It looks like maybe the version constraints in your composer.json are not strict enough? This bug definitely happens with client-common ^1.5
.
"php-http/client-common": "^1.5|^2.0",
Besides updating (and maybe requiring a minimum version of the package to avoid such issues?), I think it would be more appropriate also to handle 429 limits well, as I remember that Sentry has a different rate limit per event category. I will try to find the time to implement proper rate limiting handling in the next weeks so that we also stay coherent with the other SDKs
It's all in the changelog - https://github.com/php-http/client-common/blob/master/CHANGELOG.md - but interesting how wildly different RetryPlugin is on different versions.
With current version of the Retry Plugin, I confirmed that there is no try for 429 response (testing w/ raw webserver response code, not a real Sentry). And there is a retry for 500 error or DNS failure. If you have a totally broken system where DNS lookups fail after 5 seconds (Sentry\HttpClient\HttpClientFactory::DEFAULT_HTTP_TIMEOUT), then you'll have (1 send + 3 resend attempts) * 5 second DNS timeouts + RetryPlugin sleeping for 0.5, 1 and 2 seconds = 23.5 seconds total waiting for Sentry.
I think this issue could be closed, as the situation should be dramatically improved. The SDK will no longer retry failed send attempts, and there is now a configurable idle timeout, which defaults to 2 seconds.