sentry-python icon indicating copy to clipboard operation
sentry-python copied to clipboard

Internal error in sentry_sdk

Open manuellazzari-cargoone opened this issue 1 year ago • 11 comments

Environment

SaaS (https://sentry.io/)

Steps to Reproduce

I am randomly getting the following error from some of my Python services. The SDK seems it is sending events correctly to sentry.io. No clue if it is skipping some of them. The error seems to occur more with load.

The service is mainly a Flask Python 3.11 g, run by guincorn on a kubernetes cluster in multiple instances on Google Cloud. All dependencies are basically up to date, and I'm using sentry-sdk-1.40.2 and urllib3-2.0.7.

In the attempt of fixing it, I tried to customize as follows the HttpTransport used by Sentry SDK, with no luck.

try:
    logger.info('Initializing Sentry SDK...')
    sentry_sdk.init(
        ...,
        environment=...,
        traces_sample_rate=...,
        integrations=[FlaskIntegration(), RedisIntegration()],
    )

    import socket
    from sentry_sdk import transport
    from sentry_sdk.transport import HttpTransport
    from urllib3.connection import HTTPConnection

    class KeepAliveHttpTransport(HttpTransport):
        def _get_pool_options(self, ca_certs):
            opts = super()._get_pool_options(ca_certs)
            opts['socket_options'] = HTTPConnection.default_socket_options + [
                (socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1),
                (socket.SOL_TCP, socket.TCP_KEEPIDLE, 45),
                (socket.SOL_TCP, socket.TCP_KEEPINTVL, 10),
                (socket.SOL_TCP, socket.TCP_KEEPCNT, 6),
            ]
            return opts


    logger.info('Forcing keep alive for Sentry SDK HTTP transport...')
    transport.HttpTransport = KeepAliveHttpTransport

except Exception as e:
    logger.info(f'Unable to Start Sentry due to error: {e}')

Expected Result

No connection errors

Actual Result

Internal error in sentry_sdk
Traceback (most recent call last): File "/usr/local/lib/python3.11/dist-packages/urllib3/connectionpool.py", line 793, in urlopen response = self._make_request(
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/urllib3/connectionpool.py", line 537, in _make_request
response = conn.getresponse()
^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/urllib3/connection.py", line 466, in getresponse
httplib_response = super().getresponse()
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/sentry_sdk/integrations/stdlib.py", line 126, in getresponse
return real_getresponse(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/http/client.py", line 1386, in getresponse
response.begin()
File "/usr/lib/python3.11/http/client.py", line 325, in begin
version, status, reason = self._read_status()
^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/http/client.py", line 294, in _read_status
raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response
{}
During handling of the above exception, another exception occurred:
{}
Traceback (most recent call last): File "/usr/local/lib/python3.11/dist-packages/sentry_sdk/transport.py", line 535, in send_envelope_wrapper self._send_envelope(envelope) File "/usr/local/lib/python3.11/dist-packages/sentry_sdk/transport.py", line 434, in _send_envelope self._send_request( File "/usr/local/lib/python3.11/dist-packages/sentry_sdk/transport.py", line 245, in _send_request response = self._pool.request(
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/urllib3/_request_methods.py", line 144, in request
return self.request_encode_body(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/urllib3/_request_methods.py", line 279, in request_encode_body
return self.urlopen(method, url, **extra_kw)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/urllib3/poolmanager.py", line 444, in urlopen
response = conn.urlopen(method, u.request_uri, **kw)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/urllib3/connectionpool.py", line 847, in urlopen
retries = retries.increment(
^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/urllib3/util/retry.py", line 470, in increment
raise reraise(type(error), error, _stacktrace)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/urllib3/util/util.py", line 38, in reraise
raise value.with_traceback(tb)
File "/usr/local/lib/python3.11/dist-packages/urllib3/connectionpool.py", line 793, in urlopen
response = self._make_request(
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/urllib3/connectionpool.py", line 537, in _make_request
response = conn.getresponse()
^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/urllib3/connection.py", line 466, in getresponse
httplib_response = super().getresponse()
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/sentry_sdk/integrations/stdlib.py", line 126, in getresponse
return real_getresponse(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/http/client.py", line 1386, in getresponse
response.begin()
File "/usr/lib/python3.11/http/client.py", line 325, in begin
version, status, reason = self._read_status()
^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/http/client.py", line 294, in _read_status
raise RemoteDisconnected("Remote end closed connection without"
urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

Product Area

Issues

Link

No response

DSN

https://[email protected]/4503976860385280

Version

No response

manuellazzari-cargoone avatar Feb 13 '24 08:02 manuellazzari-cargoone

Assigning to @getsentry/support for routing ⏲️

getsantry[bot] avatar Feb 13 '24 08:02 getsantry[bot]

Routing to @getsentry/product-owners-issues for triage ⏲️

getsantry[bot] avatar Feb 13 '24 19:02 getsantry[bot]

Likely to be caused by https://github.com/getsentry/sentry-python/issues/2386

vartec avatar Feb 13 '24 23:02 vartec

Likely to be caused by #2386

@vartec how is that connected? My app does not crash, only the Sentry SDK is complaining...

manuellazzari-cargoone avatar Feb 14 '24 13:02 manuellazzari-cargoone

I am getting the same issue as above, but I haven't yet tried the custom HttpTransport. Basic specs are: sentry-sdk[flask,django,celery]==1.29.0 python3.9

kieran-sf avatar Feb 14 '24 23:02 kieran-sf

Hey @manuellazzari-cargoone and @kieran-sf! Thanks for reporting this.

My first suggestion would've been exactly what you tried @manuellazzari-cargoone with the custom socket options from https://github.com/getsentry/sentry-python/issues/1198#issuecomment-1802638463. Are you seeing at least some improvement?

I'm curious whether spacing out the events sent from the SDK makes a difference. Can you try overriding _send_request in your custom transport roughly like this?

import time

class KeepAliveHttpTransport(HttpTransport):
    def _send_request(self, *args, **kwargs):
        time.sleep(0.01)
        super()._send_request(self, *args, **kwargs)

sentrivana avatar Feb 16 '24 09:02 sentrivana

@sentrivana thanks for getting back -- when trying the custom socket settings I wasn't able to see any major differences. Now I'm testing the whole package (custom socket settings and custom send request).

manuellazzari-cargoone avatar Feb 16 '24 16:02 manuellazzari-cargoone

@sentrivana the issue is still there even by adding the custom _send_request. From a qualitative perspective, it seems that it appears more or less with the same frequency (see below an extraction from our logs before and after the changes were deployed).

2024-02-16T15:29:33.048565197Z ERROR - sentry_sdk.errors - Internal error in sentry_sdk
2024-02-16T15:30:55.565954424Z ERROR - sentry_sdk.errors - Internal error in sentry_sdk
2024-02-16T15:33:13.537149008Z ERROR - sentry_sdk.errors - Internal error in sentry_sdk
2024-02-16T15:33:17.914818994Z ERROR - sentry_sdk.errors - Internal error in sentry_sdk
2024-02-16T15:34:47.778557487Z ERROR - sentry_sdk.errors - Internal error in sentry_sdk
2024-02-16T15:34:47.819326396Z ERROR - sentry_sdk.errors - Internal error in sentry_sdk
2024-02-16T15:37:33.774480759Z ERROR - sentry_sdk.errors - Internal error in sentry_sdk
2024-02-16T15:37:34.586534890Z ERROR - sentry_sdk.errors - Internal error in sentry_sdk
2024-02-16T15:41:53.741483570Z ERROR - sentry_sdk.errors - Internal error in sentry_sdk
2024-02-16T15:45:11.560933696Z ERROR - sentry_sdk.errors - Internal error in sentry_sdk
2024-02-16T15:51:10.940568800Z ERROR - sentry_sdk.errors - Internal error in sentry_sdk
2024-02-16T15:51:11.016053254Z ERROR - sentry_sdk.errors - Internal error in sentry_sdk
2024-02-16T15:51:20.146324668Z ERROR - sentry_sdk.errors - Internal error in sentry_sdk
2024-02-16T15:51:28.814144310Z ERROR - sentry_sdk.errors - Internal error in sentry_sdk
2024-02-16T15:58:53.136957655Z ERROR - sentry_sdk.errors - Internal error in sentry_sdk
2024-02-16T15:58:54.053616565Z ERROR - sentry_sdk.errors - Internal error in sentry_sdk
2024-02-16T15:59:16.002942751Z ERROR - sentry_sdk.errors - Internal error in sentry_sdk
2024-02-16T16:03:12.954825518Z Initializing Sentry SDK... <-- changes deployed
...
2024-02-16T16:13:44.713686531Z ERROR - sentry_sdk.errors - Internal error in sentry_sdk
2024-02-16T16:22:30.285920011Z ERROR - sentry_sdk.errors - Internal error in sentry_sdk
2024-02-16T16:25:48.997920085Z ERROR - sentry_sdk.errors - Internal error in sentry_sdk
2024-02-16T16:25:49.389493362Z ERROR - sentry_sdk.errors - Internal error in sentry_sdk
2024-02-16T16:26:00.889221302Z ERROR - sentry_sdk.errors - Internal error in sentry_sdk
2024-02-16T16:38:44.314760846Z ERROR - sentry_sdk.errors - Internal error in sentry_sdk

manuellazzari-cargoone avatar Feb 16 '24 16:02 manuellazzari-cargoone

@manuellazzari-cargoone Thanks for the follow up. It looks like the sleep might've had a tiny effect, at least from the logs it looks like there's less occurrences if you look at a comparable time span before and after -- but this could obviously have to do with traffic etc., so don't think this is exactly conclusive.

Are you seeing any network errors for outgoing requests anywhere else in your system? Just trying to rule out general network instability. Alternatively I'm thinking if there's anything special about the errors/transactions -- maybe them being unusually big so it takes long to send each one and the server drops the connection? Fiddling around with the socket options might make a difference, too.

sentrivana avatar Feb 20 '24 15:02 sentrivana

Are you seeing any network errors for outgoing requests anywhere else in your system? Just trying to rule out general network instability.

I'm observing this issue just with 2 services both running with gunicorn. Also it's all GCP, so I would rule out consistent network instability.

Alternatively I'm thinking if there's anything special about the errors/transactions -- maybe them being unusually big so it takes long to send each one and the server drops the connection?

I'm not sure about size, but I know for sure we have services deal with a lot more traffic and a lot bigger packages. In particular, one of the affected services is dealing with very little traffic and small packages and still experiencing the problem.

manuellazzari-cargoone avatar Feb 21 '24 11:02 manuellazzari-cargoone

Thanks! I think in general we need to reconsider our transport logic and possibly add retries for more cases. (Currently we only retry if we're rate limited.) But at the same time, with this particular error there's no telling if the server actually processed the event and just crapped out at the very end, so it's not clear cut.

sentrivana avatar Feb 27 '24 15:02 sentrivana