asgiref icon indicating copy to clipboard operation
asgiref copied to clipboard

sync_to_async for converting generators to async generators

Open devxpy opened this issue 4 years ago • 19 comments

Fixes https://github.com/django/asgiref/issues/142

devxpy avatar May 11 '20 21:05 devxpy

The test is failing on python3.5 because it doesn't have async generator support. What's the right thing to do here? Is throwing a RuntimeError fair game?

devxpy avatar May 11 '20 21:05 devxpy

Hm, asgiref does support Python 3.5 for now, so we need to do something, or do a major version bump. I think acceptable behaviour would be to throw a RuntimeError when you try to use this on 3.5, explaining the situation, and then write a test that's skipped on 3.5 for the main implementation, and then a test only for 3.5 that tests that the error is thrown.

andrewgodwin avatar May 12 '20 16:05 andrewgodwin

2c: perhaps a TypeError is a more appropriate exception class, since the error case is the decorator being applied to the wrong type of callable.

adamchainz avatar May 12 '20 16:05 adamchainz

I don't mind too much; I generally use RuntimeError for things a bit more serious/unsolvable, but this could go either way. We can't mirror the underlying error, anyway, since it's a SyntaxError, so that's at least out of the question.

andrewgodwin avatar May 12 '20 16:05 andrewgodwin

Actually, I think I can make this support python 3.5. Let me try my hand with __aiter__ and __anext__ from PEP 492.

devxpy avatar May 12 '20 16:05 devxpy

This should be good to go now.

devxpy avatar May 13 '20 15:05 devxpy

Is there some simpler version of this we could do if we drop support for some of the earlier Python versions asgiref currently supports?

If you're talking about the inspect.isgeneratorfunction() call, then, I don't think we're in luck here.

That seems to be the only way to detect if a given function will return a generator, without actually calling that function and checking the return value.

Let me know if you find something else!

devxpy avatar May 14 '20 17:05 devxpy

Quick question - what purpose does self._is_coroutine serve in SyncToAsync.__init__()?

devxpy avatar May 14 '20 17:05 devxpy

@devxpy That's needed to signal to asyncio.iscoroutinefunction that it should return True. We'll need to keep that for anything that outputs an async-like function.

andrewgodwin avatar May 14 '20 19:05 andrewgodwin

Okay, left that unchanged.

devxpy avatar May 14 '20 19:05 devxpy

Would you like performance tests written in pytest-benchmark?

devxpy avatar May 19 '20 10:05 devxpy

No, no, just a quick one-off test to make sure it's not massively worse. timeit or something will do and then paste the results here.

andrewgodwin avatar May 19 '20 17:05 andrewgodwin

This PR -

In [6]: from asgiref.sync import sync_to_async

In [4]: def fn():
   ...:     pass
   ...:

In [7]: %timeit sync_to_async(fn)
4.93 µs ± 70.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

devxpy avatar Aug 15 '20 17:08 devxpy

On current pypi version -

In [1]: from asgiref.sync import SyncToAsync

In [2]: def fn():
   ...:     pass
   ...:

In [3]: %timeit SyncToAsync(fn)
3.35 µs ± 186 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

devxpy avatar Aug 15 '20 17:08 devxpy

Sorry this took longer than expected

devxpy avatar Aug 15 '20 17:08 devxpy

No worries - it might take me some time to get to this, unfortunately, as it's a very complex PR and reviewing it is going to take a lot of effort to make sure it doesn't subtly affect something else downstream (we may even need to cut a new major release just in case).

andrewgodwin avatar Aug 25 '20 21:08 andrewgodwin

Hi,

A few hopefully-helpful comments:

  1. I really don't like the kind of "magic" done here, to try to automatically determine if the thing is a function, generator, or iterable. In particular:
  • If an object is both callable and iterable (i.e. it has both call and iter methods), then there is no way to decide automatically if a function-wrapper or iterable-wrapper is desired. You need a way for the caller to say what they want, and separate function names is the clearest way to do that. (Perhaps sync_function_to_async and sync_iterable_to_async).
  • If a generator is wrapped somehow, then inspect.isgeneratorfunction() might not work. E.g. partial() on a generator isn't detected by inspect.isgeneratorfunction() in Python <3.8; there are likely plenty of decorators etc that have similar issues.
  • If a generator function takes a long time to run when first called (e.g. retrieving some data), but is non-blocking after that, the old behaviour is actually better. E.g. caller might do X=list(await sync_to_async(get_data)). Having the choice would be useful (e.g. having sync_function_to_async() give the old behaviour and sync_generator_to_async() the new behaviour).
  • To someone reading the code, it can require a lot of work to figure out which path is taken. If you have separate functions then it's obvious from the function name.
  1. The API allows iterables to be wrapped (i.e. things you can call iter() on) but not iterators (i.e. things you can call next() on). The code actually implements wrapping iterators, it's needed to support wrapping iterables. So it would be good to expose that functionality in the public API.

  2. Changing the behaviour of the existing function when passed generators is a non-backward-compatible change.

  3. The generator wrapping here doesn't handle all the complexities of generators. It just treats them as iterables, but they're actually a lot more complex. The caller can send a value into a generator (and it gets returned by the "yield" statement in the generator). The caller can make a "yield" statement throw an exception, and there is support for try/catch/finally around those yield statements. See PEP 342 for synchronous generator magic and PEP 525 for asynchronous generators.

So I suggest four separate new APIs - sync_function_to_async, sync_generator_to_async, sync_iterable_to_async, sync_iterator_to_async. Plus keep sync_to_async for backward compatibility, as an alias of sync_function_to_async.

If you really want a "magic" function that tries to autodetect what kind of wrapping to do, please make that a separate function.

Kind regards,

Jon

jonfoster avatar Oct 03 '20 21:10 jonfoster

I do agree somewhat with Jon here - the sheer work required to make this polymorphic to every possible callable type maybe takes away from its understandability (I've tried three times now to code review this and I'm finding it very tough). It might be better to go for a simpler, more bulletproof approach.

andrewgodwin avatar Oct 11 '20 22:10 andrewgodwin

As a note for other async newcomers like myself, by far the easiest way to deal with this is just to move the entire block using querysets to a separate, non-async function and then wrap that function call in a sync_to_async where you previously had the block. I wasted a huge amount of time thinking that I should be able to get it done in the async function, and it is almost certain to not be time worth spent for others either!

While that becomes clearer after you waste time then re-read https://docs.djangoproject.com/en/3.1/topics/async/, it might be nice to add a note to that page. Would that be a reasonable addition? Should I submit a PR?

AntonOfTheWoods avatar Jan 15 '21 23:01 AntonOfTheWoods

Looking at this, I'm quite sceptical that we should be going anywhere near this.

For simple cases PEP 492 has an example wrapping a sync iterator — Let users do that.

For more complex cases (where you're iterating out of a DB maybe) you don't want to go back and forth between sync and async contexts. Better to make people fetch-all upfront, or otherwise rewrite their code to be actually async, than provide a shim that looks like it's the right thing to do.

I'd lean towards Out-of-scope/wontfix and being happy with that TBH.

carltongibson avatar Dec 14 '22 15:12 carltongibson

Agreed - this has sat around for long enough because I simply don't think we can do a good enough job of it that will be less bad for users than saying "do the work and just make the whole loop async". There's a few cases we'll want to overcome - mapping streaming responses in Django comes to mind - but that kind of needs its own handler anyway.

andrewgodwin avatar Dec 14 '22 16:12 andrewgodwin

There's a few cases we'll want to overcome - mapping streaming responses in Django comes to mind - but that kind of needs its own handler anyway.

I don't think we should that either TBH.

See https://github.com/django/django/pull/16384 and discussion on the ticket there. (Your thoughts welcome 🙏)

carltongibson avatar Dec 14 '22 16:12 carltongibson

Ah, I did wonder, given our previous discussions on the subject. You did still suggest mapping the other case over, just with a lot of warnings, which is fine in my eyes :)

andrewgodwin avatar Dec 14 '22 16:12 andrewgodwin

That's the idea. There are a few wiggles still but I'm currently optimistic 😜

carltongibson avatar Dec 14 '22 16:12 carltongibson