tutor We must replace uwsgi by something else

uwsgi is now in maintenance mode: https://uwsgi-docs.readthedocs.io/en/latest/

The project is in maintenance mode (only bugfixes and updates for new languages apis). Do not expect quick answers on github issues and/or pull requests (sorry for that) A big thanks to all of the users and contributors since 2009.

So we should take the opportunity that we are releasing Quince to remove uwsgi from the code base.

Nov 14 '23 14:11 regisb

Before jumping into the context, the following 3 terms are very different despite having similar names:

WSGI (Web Server Gateway Interface) - it is a specification that defines the interfacing between Web server and an application. It is used in Open edX backend apps, via wsgi.py file.
uWSGI - a utility/package that allows building hosting services compliant with WSGI standards (this is uwsgi this issue is talking about). It offers many things like HTTP server, proxies, process managers, monitoring, etc.
uwsgi - a binary protocol used by uWSGI to talk to other web servers (like nginx)

Without going deeper into various utils, the top results that could be alternatives to uWSGI include:

Upon looking in tutor, I found out gunicorn was used before uWSGI. The migration to uWSGI was done when upgrading to Koa (https://github.com/overhangio/tutor/commit/728ef966dc572d37cf76f08c112277a5111b0d6b). Not sure if we can still revert to gunicorn in current state. Compared with uWSGI, gunicorn is slow (some comparison is on https://emptyhammock.com/projects/info/pyweb/gunicorn.html, however it might not be exactly true).

mod_wsgi is mainly Apache module. There were some references of nginx mod_wsgi but I was not able to find any latest information on that. CherryPy is both a Web server and a minimalistic Web framework. Waitress, on the other hand, is WSGI only server running on PyPy on Unix (python 3.7+).

I will be looking into each item and further explore any other alternatives as well.

References

https://stackoverflow.com/questions/38601440/what-is-the-point-of-uwsgi
https://www.ultravioletsoftware.com/single-post/2017/03/23/An-introduction-into-the-WSGI-ecosystem
https://medium.com/django-deployment/which-wsgi-server-should-i-use-a70548da6a83
https://github.com/unbit/uwsgi/issues/2425
https://github.com/overhangio/tutor/commit/728ef966dc572d37cf76f08c112277a5111b0d6b
https://emptyhammock.com/projects/info/pyweb/not-uwsgi.html

Jan 26 '24 11:01 DawoudSheraz

I'm thinking that a possible alternative would be to revert back to gunicorn, but launch a separate container with a web server (such as caddy or nginx) to serve static assets. I hate the fact that we would have to launch a new container just for that (and for every other web app...), but it's the only solution that I'm seeing right now.

Feb 13 '24 09:02 regisb

An alternative that I'm looking at for our WSGI server uses is nginx-unit, but I haven't done any testing of that yet.

Feb 15 '24 14:02 blarghmatey

Hi folks. I was pointed to this ticket after I posted a forums question about the performance implications of serving static assets with uWSGI.

@DawoudSheraz:

Compared with uWSGI, gunicorn is slow (some comparison is on https://emptyhammock.com/projects/info/pyweb/gunicorn.html, however it might not be exactly true).

I feel like throughput benchmarks aren't really that useful for us. Even the slowest thing there shows gunicorn with 4 worker processes giving a throughput of roughly 3200 requests per second, so ~800 req/s per worker, which means the overhead it's imposing for spawning a request and reading/writing 70K is something like 1.25 ms. I think that even our fastest web transactions run in the 30-50 ms range, with more common ones running 150+ ms, and a number of painful courseware calls running multiple seconds. Whether the WSGI server is imposing 1.25 ms of overhead or 0.4 ms of overhead won't really be noticeable in that context.

@regisb:

I hate the fact that we would have to launch a new container just for that (and for every other web app...), but it's the only solution that I'm seeing right now.

I'm still really ignorant about Docker things. Would the idea be that we'd have one static asset server container running Caddy, and that there's some shared volume where each Django service writes its static files to a different sub-directory? Or a volume per service, with the Caddy static assets container having an entry for each?

@blarghmatey:

An alternative that I'm looking at for our WSGI server uses is nginx-unit, but I haven't done any testing of that yet.

My main hesitation with this is that gunicorn is so ubiquitous for serving Python apps that a lot tooling will accommodate it out of the box. For instance, New Relic would give you things like worker availability, utilization, restart events, etc. I imagine many other APMs do the same. Gunicorn is also going to be more familiar for Python developers and easier to find solutions for common problems via StackOverflow and the like.

Apr 28 '24 03:04 ormsbee

@ormsbee Hi, thanks for the context. I have not had time to continue this issue. There is some missing context for me which some experimentation (with gunicorn and uwsgi) will help provide.

May 03 '24 13:05 DawoudSheraz

I'll add another long term alternative to uwsgi: granian. It's implemented in Rust using the hyper library. It's selling points are highly consistent response times (much smaller 99th percentile deviations, because the networking i/o stack is on the Rust side), and the ability to have a single server do ASGI and WSGI (along with its own RSGI standard that it's trying to push).

I don't think it's appropriate for us at this time. It's a single developer effort, and it has no support for "restart the worker after X requests", which we unfortunately need because of memory leaks. I merely mention it as something to keep an eye on in the longer term.

Jun 05 '24 14:06 ormsbee

it has no support for "restart the worker after X requests", which we unfortunately need because of memory leaks.

Do you have an idea of what is causing these memory leaks?

Note: here's the upstream granian issue: https://github.com/emmett-framework/granian/issues/34

Jun 07 '24 07:06 regisb

The last time someone investigated this, a lot of it owed to circular references in the XBlock runtimes and modulestore. I haven't looked into it since the major runtime simplification or old mongo removal PRs landed.

Jun 07 '24 20:06 ormsbee

For the record, the uWSGI configuration that currently ships with Tutor for the LMS/CMS containers do not make use of the "restart the worker after X requests" flag (max-requests). Tutor users are not complaining, so... I guess we don't need it?

Jun 13 '24 12:06 regisb

For the record, the uWSGI configuration that currently ships with Tutor for the LMS/CMS containers do not make use of the "restart the worker after X requests" flag (max-requests). Tutor users are not complaining, so... I guess we don't need it?

I hope that means the problem has gotten a lot better. I don't know what value it's set to for 2U these days, but we definitely had issues with it in the past and that's why there's a slot for it in the configuration repo. I suppose it's also possible that it's an issue with gunicorn specifically that's causing this...

In any case, FYI to @timmc-edx, @robrap, @dianakhuang in case this is of interest for how 2U deploys.

Jun 14 '24 15:06 ormsbee

Thanks @ormsbee. I think we still have this set for edxapp in Stage, Prod and Edge. I don't think we use it for any of our other workers. However, it's likely to remain this way, given the old "if it ain't broke, don't fix it", along with all of our other priorities. We've had too many recent incidents from changes (mostly DD related) that seem like they shouldn't cause any issues, but surprise surprise.

Jun 18 '24 18:06 robrap

tutor tutor copied to clipboard

We must replace uwsgi by something else

References

tutor
tutor copied to clipboard