Broken connections on Celery worker with logging enabled
How do you use Sentry?
Sentry Saas (sentry.io)
Version
2.29.1
Steps to Reproduce
Hi,
I'm running a Celery worker with 16 concurrent workers and recently enabled the experimental logging feature in the Python SDK. Everything works fine initially, but after a while I start seeing this error repeatedly:
[2025-06-08 08:23:14,183: WARNING/ForkPoolWorker-27] Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(1, '[SSL: DECRYPTION_FAILED_OR_BAD_RECORD_MAC] decryption failed or bad record mac (_ssl.c:2570)')': /api/4509207728947280/envelope/
This happens under high task load, and workers eventually become unresponsive. If I disable the Sentry logging integration, the problem disappears.
Setup details:
Celery 5.5.2
Python Sentry SDK versions tried: 2.28.0 and 2.29.1
Python version: 3.12.8
Worker command:
celery worker \
--concurrency=16 \
--loglevel=info \
-Q q1,q2 \
--max-memory-per-child=2621440 \
--max-tasks-per-child=10 \
--without-gossip \
--without-mingle \
--heartbeat-interval=10
Any idea what could be causing this SSL error with the logging feature?
Expected Result
No SSL errors
Actual Result
[2025-06-08 08:23:14,183: WARNING/ForkPoolWorker-27] Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(1, '[SSL: DECRYPTION_FAILED_OR_BAD_RECORD_MAC] decryption failed or bad record mac (_ssl.c:2570)')': /api/4509207728947280/envelope/
Hey @mcboman, thanks for raising this.
To clarify, when you say
If I disable the Sentry logging integration, the problem disappears.
Do you mean you add LoggingIntegration to disabled_integrations in your init, or that you turn off/remove the experimental enable_logs option?
Before the SSL errors start popping up, is there any other error visible in the logs? I'm not sure where the SSL errors could be coming from -- my hunch is that it's a consequence of something else happening.
How many logs approximately are your workers sending to Sentry? If you adjust sentry_logs_level to a higher level (see below), does it change anything?
import logging
from sentry_sdk.integrations.logging import LoggingIntegration
sentry_sdk.init(
...
integrations=[LoggingIntegration(sentry_logs_level=logging.ERROR),] # or maybe logging.WARNING; default is logging.INFO
)
Hi @sentrivana
It is the experimental feature for logging and not the instrumentation. I have attached the code for my sentry.init
Peak logs is 90k pr half hour.
sentry_sdk.init(
dsn="<dsn>",
send_default_pii=True,
integrations=[
CeleryIntegration(
monitor_beat_tasks=True,
),
SqlalchemyIntegration(),
],
traces_sample_rate=0.1,
profile_session_sample_rate=0.1,
profile_lifecycle="trace",
_experiments={
"enable_logs": True,
},
)
Thanks for the information, we will investigate this
We'll try to load test to see if we can reproduce this.
In the meantime, any additional information would be helpful. Some things I can think of atm:
- Can you see any other error prior to the SSL messages? This could help in pinpointing the initial issue.
- Does adjusting the level (
sentry_logs_level, see example) by setting it higher help? Or does it maybe delay the issue? This could tell us if this is a load problem or not. - Where are you initializing Sentry? This might be useful info for reproducing the error.
Wasn't able to get this to trigger in a load test.
In addition to the points above, I see you have profiling on, is there perhaps anything that stands out in the profiles?
(We can also take a look if you don't mind sharing your org (or feel free to send it to me at ivana.kellyer at sentry.io).)
def create_celery() -> Celery:
"""
Create a new Celery object and set the broker and result backend from config.
"""
sentry_mode = os.getenv("SENTRY_MODE", "prod")
if sentry_mode != "dev":
sentry_sdk.init(
dsn="dsn",
send_default_pii=True,
integrations=[
CeleryIntegration(
monitor_beat_tasks=True,
),
SqlalchemyIntegration(),
],
traces_sample_rate=0.1,
profile_session_sample_rate=0.1,
profile_lifecycle="trace",
)
setup_logging()
celery = Celery(__name__)
celery.conf.broker_url = get_redis_url_for_celery()
celery.conf.result_backend = get_redis_url_for_celery()
celery.conf.broker_connection_retry_on_startup = True
celery.conf.update(
{
"worker_hijack_root_logger": False,
"timezone": "Europe/Copenhagen",
"enable_utc": True,
"accept_content": ["json"],
"task_serializer": "json",
"result_serializer": "json",
"worker_concurrency": 5,
"worker_prefetch_multiplier": 1,
"broker_connection_retry": True,
"broker_pool_limit": 0,
"broker_transport_options": {
"max_connections": 1000,
"visibility_timeout": 3600,
"socket_timeout": 30,
"socket_connect_timeout": 30,
"socket_keepalive": True,
"retry_on_timeout": True,
},
"task_acks_late": True,
}
)
print("Celery timezone: ", celery.conf.timezone)
return celery
This is how we initialize it.
Entry for starting the container:
CELERY_OPTS_DEFAULT=(
--concurrency=16
--loglevel=info
-Q q1,q2,q3,q4
--max-memory-per-child=2621440
--max-tasks-per-child=10
--without-gossip
--without-mingle
--heartbeat-interval=10
)
exec celery -A "$CELERY_APP" worker "${CELERY_OPTS_ARRAY[@]}"
There is no prior to the exceptions, but I suspect the max-tasks-per-child restart of a worker might be what starts it all, because it doesn't start before some time.
@mcboman could you please provide us with your org slug/ID or DSN? Thanks
I want to give a little update.
I changed it to initialize sentry in a @worker_process_init.connect instead of running it in the create_celery method.
It has been running for some days now without any problems.
I have updated the SDK to version >= 2.30.0.
The root cause might have been the worker restarting after 10 tasks causing a bad cut off.
@mcboman glad to hear you were able to fix the problem!