asgiref icon indicating copy to clipboard operation
asgiref copied to clipboard

Keep thread pool executors around.

Open apollo13 opened this issue 2 years ago • 3 comments

I'd love to hear your thoughts on thise one @carltongibson and @andrewgodwin. It just seems so wasteful to recreate new threads all the time when we can just reuse them.

Given this obviously somewhat artificial change, the timings go down from 15 seconds to 8 seconds:

#!/usr/bin/env python3

import asyncio
from asgiref.sync import ThreadSensitiveContext, sync_to_async


def sync():
    pass


async def test():
    for i in range(100_000):
        async with ThreadSensitiveContext():
            await sync_to_async(sync)()


async def main():
    await asyncio.gather(test())


if __name__ == "__main__":
    import time

    s = time.perf_counter()
    asyncio.run(test())
    elapsed = time.perf_counter() - s
    print(f"{__file__} executed in {elapsed:0.2f} seconds.")

Completely unscientifically this means that the time to create and destroy 100000 threads is roughly 7 seconds on my machine. I am not really sure what a good thread count would be, it might make sense to copy the default max_workers from the thread pool executor.

apollo13 avatar Dec 12 '21 10:12 apollo13

The irony of pooling a threadpool is not lost on me, but if it provides significant enough performance benefits, maybe it's worth it. Any idea what % saving this is on a more traditional use case?

andrewgodwin avatar Dec 12 '21 19:12 andrewgodwin

Most likely negligible (when looking at the "raw" numbers -- 7 seconds for 100000 threads is way below a millisecond if I did math correctly ;)). That said I do not know how much thread state python/django can accumulate that would require cleaning up and could increase that number. There is also the question on how well threads perform on other systems; I do not know much about windows or mac in that regard.

What it would allow for (in theory -- currently breaks because django.db.connection is context aware and not just a threading local) is that you could run with a pool of lets say 10 threads and reuse connections (ie run with persistent connections).

apollo13 avatar Dec 12 '21 19:12 apollo13

Hm, yeah, the extra complexity for such a small performance gain does give me a little pause. If there's another convincing reason to add it, like the connections, that might change things though - but my understanding of the problem with connection management is that the same request needs to be on the same thread/transaction, and that's harder.

andrewgodwin avatar Dec 17 '21 19:12 andrewgodwin