sentry-python icon indicating copy to clipboard operation
sentry-python copied to clipboard

uWSGI worker throwing Runtime Exception when run with `py-call-uwsgi-fork-hooks` flag

Open zaheerabbas-prodigal opened this issue 11 months ago • 4 comments

How do you use Sentry?

Self-hosted/on-premise

Version

1.40.4

Steps to Reproduce

Hello Sentry team,

We run a flask application behind a uWSGI, we recently integrated Sentry into one of our services with Sentry SDK 1.40.0. After a few days into testing, we ran into issues with the uWSGI queue socket being full, the exact log message is uWSGI listen queue of socket "127.0.0.1:41987" (fd: 3) full and no further requests were being served from our application.

After investigating, we found that this was due to some deadlock during the worker being respawned and error logging as per this comment https://github.com/unbit/uwsgi/issues/1369#issuecomment-512999746.

Also, we found this comment https://github.com/getsentry/sentry-python/issues/2699#issuecomment-1944336675 which pointed us to make the following changes:

  1. Upgrade Sentry SDK version to 1.40.4
  2. Add py-call-uwsgi-fork-hooks to uwsgi configs (we already were using enabled-threads in the config)

After the above changes, we no longer can replicate the listen queue of socket full. But ran into Runtime exceptions which are again related to locks in the logging module. The below log line occurs every time a worker respawns:

worker 1 lifetime reached, it was running for 61 second(s)
Respawned uWSGI worker 1 (new pid: 1208)
Exception ignored in: <function _after_at_fork_child_reinit_locks at 0x7f3777401160>
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/logging/__init__.py", line 264, in _after_at_fork_child_reinit_locks
    _releaseLock()  # Acquired by os.register_at_fork(before=.
  File "/usr/local/lib/python3.8/logging/__init__.py", line 232, in _releaseLock
    _lock.release()
RuntimeError: cannot release un-acquired lock

Python version: 3.8.14 Hosted on: AWS EKS Sentry On Prem Server Version: 24.1.2 uWSGI Version: 2.0.21

uWSGI config that is causing the Runtime exception:

[uwsgi]
http = 0.0.0.0:9123
module = flask_app:app
buffer-size = 32768

strict = true
master = true
enable-threads = true
processes = 4
threads = 4
vacuum = true                       ; Delete sockets during shutdown
single-interpreter = true
die-on-term = true                  ; Shutdown when receiving SIGTERM (default is respawn)
need-app = true                     ; Do not start if application module not found

harakiri = 60                        ; Forcefully kill workers after this many seconds
add-header = Connection: Keep-Alive
http-keepalive = 1
auto-procname = true                 ; Better naming of uwsgi worker processes to identify easily

max-requests = 1000                  ; Restart workers after this many requests
max-worker-lifetime = 3600           ; Restart workers after this many seconds
reload-on-rss = 900                  ; Restart workers after this much resident memory, 
worker-reload-mercy = 60             ; How long to wait before forcefully killing workers

disable-logging = true               ; Disable built-in logging 
log-4xx = true                       ; but log 4xx's anyway
log-5xx = true                       ; and 5xx's

py-call-uwsgi-fork-hooks = true      ; https://docs.sentry.io/platforms/python/integrations/flask/#behavior

Reproduction Steps:

  1. Flask Server with the above uWSGI config as config.ini file
  2. Sentry SDK with version: 1.40.4
  3. Sentry init code:
sentry_sdk.init(
        environment="DEV",
        sample_rate=0.8,
        max_breadcrumbs=50,
        enable_tracing=True,
        traces_sample_rate=0.2,
        profile_sample_rate=0.2,
    )
  1. Run uwsgi config.ini
  2. Wait for any worker to respawn. (We reduced the max-worker-lifetime to 60 seconds to quickly replicate this)

Expected Result

No Exception should be thrown as the recommendation from Sentry is to use uWSGI with the py-call-uwsgi-fork-hooks flag as per this comment https://github.com/getsentry/sentry-python/issues/2699#issuecomment-1944336675 and as mentioned here

Actual Result

A weird Runtime exception is thrown for everytime a worker respawns

worker 1 lifetime reached, it was running for 61 second(s)
Respawned uWSGI worker 1 (new pid: 1208)
Exception ignored in: <function _after_at_fork_child_reinit_locks at 0x7f3777401160>
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/logging/__init__.py", line 264, in _after_at_fork_child_reinit_locks
    _releaseLock()  # Acquired by os.register_at_fork(before=.
  File "/usr/local/lib/python3.8/logging/__init__.py", line 232, in _releaseLock
    _lock.release()
RuntimeError: cannot release un-acquired lock

zaheerabbas-prodigal avatar Mar 01 '24 15:03 zaheerabbas-prodigal

Hey @zaheerabbas-prodigal, thank you for reporting and for providing all the details!

As far as I can tell this is caused by uWSGI not calling all the necessary fork hooks. Basically, in pre-fork mode, everything is initialized in one process that is then forked into additional workers. When forking a process, C extensions are expected to call PyOS_BeforeFork(), PyOS_AfterFork_Parent() in the parent and PyOS_AfterFork_Child() in the child to do stuff like clean up threads in the child processes.

The problem is that by default, uWSGI calls none of them. If you supply --py-call-uwsgi-fork-hooks as we recommend, PyOS_AfterFork_Child() will be called (and take care of some of the necessary thread cleanup), but the other two hooks still won't, and I assume PyOS_BeforeFork() acquires the logging lock in question that PyOS_AfterFork_Child() is then trying to release.

There is a long standing uWSGI pull request to utilize all three hooks.

As for what to do in your case, you can disable pre-forking mode completely with --lazy-apps, in which case every worker sets up the app itself and there is no broken forking involved.

sentrivana avatar Mar 04 '24 11:03 sentrivana

@sentrivana - thank you for explaining the issue. I appreciate the quick response.

I have asked an update on the PR you linked. The lazy-app might cause some memory issues for us. Is there any other way we could make the sentry integration work? Has no one else in the community faced this problem with uWSGI <> Sentry post Python 3.7+ changes where this was introduced?

zaheerabbas-prodigal avatar Mar 04 '24 14:03 zaheerabbas-prodigal

Is there any other way we could make the sentry integration work?

Not that I know of. We're usually happy to either patch or work around things in libraries and frameworks that don't work with the SDK but the missing fork hooks is something so fundamental that there's not much we can do about it. The only solution as long as the upstream PR is not merged is to skip the broken forking altogether with --lazy-apps or, unfortunately, to switch away from uWSGI.

Has no one else in the community faced this problem with uWSGI <> Sentry post Python 3.7+ changes where this was introduced?

We've only started recommending using --py-call-uwsgi-fork-hooks fairly recently which might explain why there haven't been other reports about this so far. That being said, I didn't encounter this myself while working on the original issue. Granted, I was using a very bare-bones Flask app to repro and wasn't doing any logging stuff in the app itself, but things were still being logged indirectly via e.g. the SDK's debug mode. So the RuntimeError might be triggered by something specific in your logging setup -- it's a long shot but playing around with where and when you log things that might also be a workaround.

sentrivana avatar Mar 04 '24 15:03 sentrivana

This issue has gone three weeks without activity. In another week, I will close it.

But! If you comment or otherwise update it, I will reset the clock, and if you remove the label Waiting for: Community, I will leave it alone ... forever!


"A weed is but an unloved flower." ― Ella Wheeler Wilcox 🥀

getsantry[bot] avatar Jun 28 '24 07:06 getsantry[bot]