richie
richie copied to clipboard
Richie Gunicorn worker freezes and no stack trace on log
During my work with putting NAU richie site installation on premises, instead of using the AWS like FUN, I found a problem that made the Gunicorn worker to timeout. The problem was something strange, that probably won't happen on other installations. The virtual machine that has the Richie didn't had connectivity to the on-premises S3 Bucket provider (Ceph).
My problem was not having at least a single line of stack trace on richie app logs. I don't know if anyone has some related strange problems, but after trying different approaches I found that this works and I can see the errors !!
So for us the solution was to add this block to the end of 'app.py' file on nau-richie-site-factory.
Note: I think the problem is not related to Richie core, but to our installation.
Code block
# Handler that forces Gunicorn to print the current stack trace, when its worker timeout’s.
# This can happen when a request hangs waiting for an external service dependency, like when a
# database is too slow or can connect to the S3 Bucket provider. The worker receives the SIGABRT
# signal from the Gunicorn parent process and this handler runs before the worker it stopped.
# Reference:
# https://stackoverflow.com/questions/15442361/dump-django-stack-trace-on-gunicorn-timeout
def worker_abort(worker):
import traceback, io
debug_info = io.StringIO()
debug_info.write("Traceback at time of timeout:\n")
traceback.print_stack(file=debug_info)
worker.log.critical(debug_info.getvalue())
Log lines that I could view on richie_app docker container after adding the code block on 'app.py' file.
[2021-09-02 16:20:13 +0100] [13] [INFO] Worker exiting (pid: 13)
[2021-09-02 15:20:13 +0000] [17] [INFO] Booting worker with pid: 17
[2021-09-02 15:21:02 +0000] [10] [CRITICAL] WORKER TIMEOUT (pid:14)
[2021-09-02 16:21:02 +0100] [14] [CRITICAL] Traceback at time of timeout:
File "/usr/local/bin/gunicorn", line 8, in <module>
sys.exit(run())
File "/usr/local/lib/python3.7/site-packages/gunicorn/app/wsgiapp.py", line 58, in run
WSGIApplication("%(prog)s [OPTIONS] [APP_MODULE]").run()
File "/usr/local/lib/python3.7/site-packages/gunicorn/app/base.py", line 228, in run
super().run()
File "/usr/local/lib/python3.7/site-packages/gunicorn/app/base.py", line 72, in run
Arbiter(self).run()
File "/usr/local/lib/python3.7/site-packages/gunicorn/arbiter.py", line 202, in run
self.manage_workers()
File "/usr/local/lib/python3.7/site-packages/gunicorn/arbiter.py", line 545, in manage_workers
self.spawn_workers()
File "/usr/local/lib/python3.7/site-packages/gunicorn/arbiter.py", line 616, in spawn_workers
self.spawn_worker()
File "/usr/local/lib/python3.7/site-packages/gunicorn/arbiter.py", line 583, in spawn_worker
worker.init_process()
File "/usr/local/lib/python3.7/site-packages/gunicorn/workers/base.py", line 140, in init_process
self.run()
File "/usr/local/lib/python3.7/site-packages/gunicorn/workers/sync.py", line 123, in run
self.run_for_one(timeout)
File "/usr/local/lib/python3.7/site-packages/gunicorn/workers/sync.py", line 67, in run_for_one
self.accept(listener)
File "/usr/local/lib/python3.7/site-packages/gunicorn/workers/sync.py", line 29, in accept
self.handle(listener, client, addr)
File "/usr/local/lib/python3.7/site-packages/gunicorn/workers/sync.py", line 134, in handle
self.handle_request(listener, req, client, addr)
File "/usr/local/lib/python3.7/site-packages/gunicorn/workers/sync.py", line 175, in handle_request
respiter = self.wsgi(environ, resp.start_response)
File "/usr/local/lib/python3.7/site-packages/django/core/handlers/wsgi.py", line 133, in __call__
response = self.get_response(request)
File "/usr/local/lib/python3.7/site-packages/django/core/handlers/base.py", line 130, in get_response
response = self._middleware_chain(request)
File "/usr/local/lib/python3.7/site-packages/django/core/handlers/exception.py", line 47, in inner
response = get_response(request)
File "/usr/local/lib/python3.7/site-packages/django/utils/deprecation.py", line 114, in __call__
response = response or self.get_response(request)
File "/usr/local/lib/python3.7/site-packages/django/core/handlers/exception.py", line 47, in inner
response = get_response(request)
File "/usr/local/lib/python3.7/site-packages/django/utils/deprecation.py", line 114, in __call__
response = response or self.get_response(request)
File "/usr/local/lib/python3.7/site-packages/django/core/handlers/exception.py", line 47, in inner
response = get_response(request)
File "/usr/local/lib/python3.7/site-packages/django/utils/deprecation.py", line 114, in __call__
response = response or self.get_response(request)
File "/usr/local/lib/python3.7/site-packages/django/core/handlers/exception.py", line 47, in inner
response = get_response(request)
File "/usr/local/lib/python3.7/site-packages/django/utils/deprecation.py", line 114, in __call__
response = response or self.get_response(request)
File "/usr/local/lib/python3.7/site-packages/django/core/handlers/exception.py", line 47, in inner
response = get_response(request)
File "/usr/local/lib/python3.7/site-packages/django/utils/deprecation.py", line 114, in __call__
response = response or self.get_response(request)
File "/usr/local/lib/python3.7/site-packages/django/core/handlers/exception.py", line 47, in inner
response = get_response(request)
File "/usr/local/lib/python3.7/site-packages/django/utils/deprecation.py", line 114, in __call__
response = response or self.get_response(request)
File "/usr/local/lib/python3.7/site-packages/django/core/handlers/exception.py", line 47, in inner
response = get_response(request)
File "/usr/local/lib/python3.7/site-packages/django/utils/deprecation.py", line 114, in __call__
response = response or self.get_response(request)
File "/usr/local/lib/python3.7/site-packages/django/core/handlers/exception.py", line 47, in inner
response = get_response(request)
File "/usr/local/lib/python3.7/site-packages/django/utils/deprecation.py", line 114, in __call__
response = response or self.get_response(request)
File "/usr/local/lib/python3.7/site-packages/django/core/handlers/exception.py", line 47, in inner
response = get_response(request)
File "/usr/local/lib/python3.7/site-packages/django/utils/deprecation.py", line 114, in __call__
response = response or self.get_response(request)
File "/usr/local/lib/python3.7/site-packages/django/core/handlers/exception.py", line 47, in inner
response = get_response(request)
File "/usr/local/lib/python3.7/site-packages/django/utils/deprecation.py", line 114, in __call__
response = response or self.get_response(request)
File "/usr/local/lib/python3.7/site-packages/django/core/handlers/exception.py", line 47, in inner
response = get_response(request)
File "/usr/local/lib/python3.7/site-packages/django/utils/deprecation.py", line 114, in __call__
response = response or self.get_response(request)
File "/usr/local/lib/python3.7/site-packages/django/core/handlers/exception.py", line 47, in inner
response = get_response(request)
File "/usr/local/lib/python3.7/site-packages/django/utils/deprecation.py", line 114, in __call__
response = response or self.get_response(request)
File "/usr/local/lib/python3.7/site-packages/django/core/handlers/exception.py", line 47, in inner
response = get_response(request)
File "/usr/local/lib/python3.7/site-packages/django/utils/deprecation.py", line 114, in __call__
response = response or self.get_response(request)
File "/usr/local/lib/python3.7/site-packages/django/core/handlers/exception.py", line 47, in inner
response = get_response(request)
File "/usr/local/lib/python3.7/site-packages/django/utils/deprecation.py", line 114, in __call__
response = response or self.get_response(request)
File "/usr/local/lib/python3.7/site-packages/django/core/handlers/exception.py", line 47, in inner
response = get_response(request)
File "/usr/local/lib/python3.7/site-packages/django/utils/deprecation.py", line 114, in __call__
response = response or self.get_response(request)
File "/usr/local/lib/python3.7/site-packages/django/core/handlers/exception.py", line 47, in inner
response = get_response(request)
File "/usr/local/lib/python3.7/site-packages/django/utils/deprecation.py", line 114, in __call__
response = response or self.get_response(request)
File "/usr/local/lib/python3.7/site-packages/django/core/handlers/exception.py", line 47, in inner
response = get_response(request)
File "/usr/local/lib/python3.7/site-packages/django/core/handlers/base.py", line 181, in _get_response
response = wrapped_callback(request, *callback_args, **callback_kwargs)
File "/usr/local/lib/python3.7/site-packages/django/views/decorators/csrf.py", line 54, in wrapped_view
return view_func(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/filer/admin/clipboardadmin.py", line 115, in ajax_upload
file_obj.save()
File "/usr/local/lib/python3.7/site-packages/filer/models/imagemodels.py", line 30, in save
exif_date = self.exif.get('DateTimeOriginal', None)
File "/usr/local/lib/python3.7/site-packages/filer/models/abstract.py", line 88, in _get_exif
self._exif_cache = get_exif_for_file(self.file)
File "/usr/local/lib/python3.7/site-packages/filer/utils/pil_exif.py", line 19, in get_exif_for_file
im = PILImage.open(storage.open(file_obj.name), 'r')
File "/usr/local/lib/python3.7/site-packages/django/core/files/storage.py", line 38, in open
return self._open(name, mode)
File "/usr/local/lib/python3.7/site-packages/storages/backends/s3boto3.py", line 427, in _open
f = S3Boto3StorageFile(name, mode, self)
File "/usr/local/lib/python3.7/site-packages/storages/backends/s3boto3.py", line 107, in __init__
self.obj.load()
File "/usr/local/lib/python3.7/site-packages/boto3/resources/factory.py", line 505, in do_action
response = action(self, *args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/boto3/resources/action.py", line 83, in __call__
response = getattr(parent.meta.client, operation_name)(*args, **params)
File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 357, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 663, in _make_api_call
operation_model, request_dict, request_context)
File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 682, in _make_request
return self._endpoint.make_request(operation_model, request_dict)
File "/usr/local/lib/python3.7/site-packages/botocore/endpoint.py", line 102, in make_request
return self._send_request(request_dict, operation_model)
File "/usr/local/lib/python3.7/site-packages/botocore/endpoint.py", line 135, in _send_request
request, operation_model, context)
File "/usr/local/lib/python3.7/site-packages/botocore/endpoint.py", line 167, in _get_response
request, operation_model)
File "/usr/local/lib/python3.7/site-packages/botocore/endpoint.py", line 200, in _do_get_response
http_response = self._send(request)
File "/usr/local/lib/python3.7/site-packages/botocore/endpoint.py", line 269, in _send
return self.http_session.send(request)
File "/usr/local/lib/python3.7/site-packages/botocore/httpsession.py", line 317, in send
chunked=self._chunked(request.headers),
File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 706, in urlopen
chunked=chunked,
File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 394, in _make_request
conn.request(method, url, **httplib_request_kw)
File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 234, in request
super(HTTPConnection, self).request(method, url, body=body, headers=headers)
File "/usr/local/lib/python3.7/http/client.py", line 1281, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/usr/local/lib/python3.7/site-packages/botocore/awsrequest.py", line 92, in _send_request
method, url, body, headers, *args, **kwargs)
File "/usr/local/lib/python3.7/http/client.py", line 1327, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/usr/local/lib/python3.7/http/client.py", line 1276, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/usr/local/lib/python3.7/site-packages/botocore/awsrequest.py", line 119, in _send_output
self.send(msg)
File "/usr/local/lib/python3.7/site-packages/botocore/awsrequest.py", line 203, in send
return super(AWSConnection, self).send(str)
File "/usr/local/lib/python3.7/http/client.py", line 976, in send
self.connect()
File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 200, in connect
conn = self._new_conn()
File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 170, in _new_conn
(self._dns_host, self.port), self.timeout, **extra_kw
File "/usr/local/lib/python3.7/site-packages/urllib3/util/connection.py", line 86, in create_connection
sock.connect(sa)
File "/usr/local/lib/python3.7/site-packages/gunicorn/workers/base.py", line 200, in handle_abort
self.cfg.worker_abort(self)
File "/usr/local/etc/gunicorn/app.py", line 27, in worker_abort
traceback.print_stack(file=debug_info)
[2021-09-02 16:21:02 +0100] [14] [INFO] Worker exiting (pid: 14)
[2021-09-02 15:21:02 +0000] [18] [INFO] Booting worker with pid: 18
Hi @igobranco :wave:
Thanks for the report! Can you paste your gunicorn configuration? Is it similar to the default one we use [1]?
[1] https://github.com/openfun/richie/blob/master/docker/files/usr/local/etc/gunicorn/richie.py
Hi @igobranco
Thanks for the report! Can you paste your
gunicornconfiguration? Is it similar to the default one we use [1]?[1] https://github.com/openfun/richie/blob/master/docker/files/usr/local/etc/gunicorn/richie.py
Hi @jmaupetit Yes, it's similar.
Particularly the logging configuration, right?
Yes, the file is equal with the base richie.py, event the logging configuration.
NAU version after adding that additional code block: https://github.com/fccn/nau-richie-site-factory/blob/master/docker/files/usr/local/etc/gunicorn/app.py
Sorry for the noise: I didn't notice that you already posted your configuration in the issue description... :sweat_smile:
