django-multicore
django-multicore copied to clipboard
A framework that makes it easy to parallelize Django code
Django Multicore
An app that makes it easy to parallelize Django code.
.. figure:: https://travis-ci.org/praekelt/django-multicore.svg?branch=develop :align: center :alt: Travis
.. contents:: Contents :depth: 5
Installation
#. Install or add dill
and django-multicore
to your Python path.
#. Add multicore
to your INSTALLED_APPS
setting.
Overview
Django itself is a single threaded application but many pieces of code are trivially easy to run in parallel. Multi-threading is typically used to achieve this but it is still subject to Python's Global Interpreter lock. Python's multiprocessing module bypasses the GIL but it is harder to use within Django.
This app presents a simple interface for writing parallel code.
Features
#. Persistent pool of workers enabling persistent database connections. #. Can take system load average into account to decide whether parallelization is worth it at any given time.
Architecture
Django Multicore is effectively an in-memory queue that is processed by a fixed set of workers. It uses memory mapping to avoid the latency imposed by using a queing system such as celery.
Usage
Let's render 100 users. Always break a large task into smaller tasks, but not too small! If the ranges are too small then tasks aren't worth the effort because the overhead becomes too much.::
import time
from multicore import Task
from multicore.utils import ranges
def expensive_render(user):
time.sleep(0.01)
return user.username
def multi_expensive_render(start, end):
s = ""
for user in User.objects.all()[start:end]:
s += expensive_render(user)
return s
task = Task()
users = User.objects.all()[:100]
for start, end in ranges(users):
# Note we don't pass "users" to run because it can't be pickled
task.run(multi_expensive_render, start, end)
print ", ".join(task.get())
If we control the code it's easy but sometimes we need to monkey patch code in another app. Django Rest Framework's list fetch is a prime candidate for parallelization. Here's the original code (pagination block omitted for brevity)::
def list(self, request, *args, **kwargs):
queryset = self.filter_queryset(self.get_queryset())
serializer = self.get_serializer(queryset, many=True)
return Response(serializer.data)
Rewriting it requires a lot of knowledge of how DRF works.::
import importlib
from django.core.urlresolvers import resolve
from rest_framework.mixins import ListModelMixin
from multicore import Task
from multicore.utils import ranges, PicklableWSGIRequest
def helper(request, start, end):
view_func, args, kwargs = resolve(request.get_full_path())
module = importlib.import_module(view_func.__module__)
view = getattr(module, view_func.__name__)()
setattr(view, "request", request)
view.format_kwarg = view.get_format_suffix()
queryset = view.filter_queryset(view.get_queryset())
serializer = view.get_serializer(queryset[start:end], many=True)
return serializer.data
def mylist(self, request, *args, **kwargs):
queryset = self.filter_queryset(self.get_queryset())
task = Task()
if task is not None:
for start, end in ranges(queryset):
task.run(
helper, PicklableWSGIRequest(request._request),
start, end
)
# Get results and combine the lists
results = [item for sublist in task.get() for item in sublist]
return Response(results)
else:
serializer = self.get_serializer(queryset, many=True)
results = serializer.data
return Response(results)
ListModelMixin.list = mylist
The run
method takes an optional parameter serialization_format
with value
pickle
(the default), json
or string
. Pickle is slow and safe. If you
know what type of data you have (you should!) set this as appropriate.
The run
method also takes an optional parameter use_dill
with default
value False
. Dill is a library that can often pickle things that can't be
pickled by the standard pickler but it is slightly slower.
Settings
If the system load average exceeds this value then a multicore task won't be created and your code must fall back to a synchronous code path. Note that this value is for a single core machine and is automatically converted to reflect the actual number of cores on the machine. A value of None (the default) always creates a multicore task::
MULTICORE = {"max-load-average": 85}
FAQ's
My webserver is already under load. How does this app help?
Webservers typically run number-of-cores x 8 Django processes at 70% load because it gives you enough overhead while at the same time not wasting money by sitting idly.
If you have 4 cores and 4 cold requests arrive (requests that won't hit the Django cache and thus take longer to complete) then multicore won't help you. However, if less than 4 cold requests arrive then you have a core available to reduce the response time of each individual request.
Will it try to execute hundreds of pieces of code in parallel?
No. The worker pool has a fixed size and can only execute number-of-cores
tasks in parallel. You may also set max_load_average
as a further guard.
Why didn't you use multiprocessing.Pool?
It just has too many issues with Django when it comes to scoping. Even pipes and sockets introduce too much overhead, so memory mapping is used.
Do you have any benchmarks?
No, because this is just an interface, not a collection of parallel code.
Okay... the unit test is 3 times as fast on a quad core machine. And the Django Rest Framework code in this doc is 2 times as fast on the same quad core machine. Note that it is very dependent on the type of serializer and data.
In general the code scales nearly linearly if you don't access the database. Multicore itself adds about 5 milliseconds overhead on my machine.
Multicore and Gunicorn
Django is typically run as a WSGI server in a production environment and it pre-forks to a user-defined number of Gunicorn workers. Each Gunicorn worker will also get its own pool of multicore workers so you end up with many more Python processes. It's not possible to make all the Gunicorn workers use the multicore worker pool from the master Django process because we use memory mapping for inter process communications, not pipes.
In practice this is not a problem - performance remains very good.