autobahn-python icon indicating copy to clipboard operation
autobahn-python copied to clipboard

a Future constructor is missing the `loop` kwarg

Open petri opened this issue 9 years ago • 20 comments

There's a Future created by the _consume method of autobahn.asyncio.websocket.WebSocketAdapterProtocol.

It is not passing the loop kwarg, which leads to NotImplementedErrorwhen the implicit default 'global' loop is not being used. If I understand correctly.

I am guessing it's possible to reproduce this simply by creating a custom loop, rather than asking for the default via asyncio.get_event_loop(), and passing that to the connection factory.

I encountered the issue when creating what amounts to an autobahn websocket connection test using py.test (with help from the pytest-asyncio package that provides some useful asyncio testing fixtures). To repeat that, write a test function to make a websocket connection so that:

  • the test is decorated with the pytest.mark.asyncio decorator, with the forbid_global_loopparameter set to True)
  • the event_loop fixture is used for the test, and explicitly used as the loop passed to autobahn

So the 'stanza' for such a test becomes like this:

from pytest import mark

@mark.asyncio(forbid_global_loop=True)
async def test_ws_connect(event_loop):
   "some test that connects"
...

Then, upon connect, the NotImplementedError will be raised.

As for fixing the issue, the loop appears to be available in the protocol as self.transport._loop, and I can confirm that passing it as loopkwarg fixes the issue.

petri avatar Nov 02 '16 20:11 petri

Hmm, yeah that's wrong. It actually should be using txaio.create_future(), probably? (Although, inside autobahn.asyncio.* we can often get away with using asyncio-specifics).

If we keep the bare Future(), it should probably use txaio.config.loop in any case...which then makes the "loop=" kwarg a bit of a lie. I'm not actually sure why this websocket code isn't using txaio.create_future and friends already ...

Thanks for the pointers on your unit-test. We're not using pytest in this particular case, but do have a similar test-helper in txaio.testutil.replace_loop https://github.com/crossbario/txaio/blob/master/txaio/testutil.py#L33

@petri if you can post your code somewhere that'd be ideal :) (For your use-case, replacing "the" loop globally via txaio would work, correct? i.e. you don't actually need different loops per websocket listener/client?)

meejah avatar Nov 02 '16 21:11 meejah

I was having the exact some problem, but I think the loop needs to be referenced as self.factory.loop. Added a pull request that should resolve this issue.

soulmerge avatar Nov 03 '16 14:11 soulmerge

Yes, https://github.com/crossbario/autobahn-python/pull/748 may be "a" stop-gap solution, but we need some tests for this and also need to figure out the implications of not using the txaio stuff -- that is, most of Autobahn uses txaio so that we can have mostly the same code across Twisted and asyncio. The problem here is that this loop= code bypasses that.

Several options:

  • only use the loop= loop
  • if no loop= is passed, use txaio's loop (via txaio.config.loop) if one is configured
  • just always use txaio directly (i.e. no bare Future(..) use, switch to txaio.create_future() like elsewhere). This option would imply removing the loop= kwarg to these factories, IMO, which may be a compatibility problem.

meejah avatar Nov 03 '16 16:11 meejah

I have a use-case question, too: does anyone need a loop-per-WebSocket* factory, or is the ability to change "the" loop sufficient?

txaio is basically a lowest-common-denominator library between asyncio and Twisted -- and since Twisted only has "a" reactor loop currently, so does txaio.

meejah avatar Nov 03 '16 16:11 meejah

I thought txaio is only meant to be used to provide an abstraction layer so code can run unmodified both on asyncio and twisted? So why would it be needed in the asyncio-specific package?

petri avatar Nov 03 '16 17:11 petri

Yes, we can use asyncio-only APIs inside this asyncio-specific package. However, txaio has its own concept of "the" event loop and this stuff bypasses all that -- so needs some consideration.

That is, if you set up an event loop in txaio and then run this you'll be sad that your event-loop isn't getting used by these websocket connections. (Also, yields() in particular is exactly the same as txaio.is_future() but that's relatively minor)

meejah avatar Nov 03 '16 17:11 meejah

What I think we should do is: remove the loop= kwarg entirely, and fix this code so it uses the txaio-configured event-loop (if available) or the default event loop otherwise (whether through txaio APIs or straight Future() calls). Also add test-coverage.

...BUT that removes a currently-existing public API is mostly why I added the needs-discussion tag.

meejah avatar Nov 03 '16 17:11 meejah

Explicit pass of loop is de facto for asyncio applications. So another hard option:

  • refactor txaio for instantiate configured object with loop or reactor
  • call methods on this object instead on txaio module

laz2 avatar Nov 03 '16 17:11 laz2

Well, we can't have txaio.create_future(loop=..) because Twisted doesn't support that -- this is why txaio has one event-loop configured globally.

But, I guess you're suggesting some sort of intermediate object that gets passed a loop (and in Twisted case I guess it would just have to throw an exception if the loop isn't "the one true reactor"). e.g. something like txaio.with_loop(loop) or something, and the object that returns has all the API methods of txaio (create_future, etc).

I still ask, though: is there really a use-case for N loops? Or is "default loop or 1 explicit loop" sufficient? Can you point me to examples of asyncio code that use > 1 loop (and why they do that)? The use-case in this particular issue is for testing, which seems completely sufficient to have just one replacement loop.

Another possible solution:

  • use asyncio-specific code in this module (i.e. Future(loop=) directly
  • make sure that if no loop is passed (or None), we use txaio.config.loop (if it's not None)

This would, hopefully (and, should have tests!) mean that someone who never passed loop= but did use txaio to set an explicit loop would still get that txaio-known explicit loop but people who either currently use the loop= API or who need > 1 loop can do so. I still think this will lead to odd situations and weird edge-cases, though: any "common" code in the generic stuff will be using txaio.* and only see the one txaio loop (e.g. any code in autobahn.websocket.protocol).

meejah avatar Nov 03 '16 19:11 meejah

I don't know about the cross-platform compatibility, but the autobahn.asyncio.websocket.WebSocketServerFactory (which is a dedicated class for use with asyncio) is explicitly accepting a loop keyword in its constructor and the usage of the Future() without a loop keyword breaks the promise of that factory's constructor.

As for an example usage with multiple event loops: I am currently working on a deployment script that can manage multiple services, each running in a separate thread, and each service (possibly) using different main-loop implementations. So this library needs to be able to handle a deployment of (I'm making up an example scenario here) four different services, where one service implements a threaded web-server where the main thread just accepts connections and spawns a new thread for each connection, an FTP service that makes use of an ftp library that operates using calls to select.epoll(), and two other services that use asyncio, but possibly using different helper libraries.

The library in this scenario has no clue whatsoever what technologies are being used for each service and the services don't know about each other's implementations, either. This is why each of these service implementations must be able to manage a dedicated event loop in my case.

I encountered this bug during the implementation of an autobahn-powered websocket service.

soulmerge avatar Nov 03 '16 20:11 soulmerge

I agree this class is breaking the implied contract from the loop= kwarg. It's also breaking the contract of "I set up an alternate event-loop in txaio".

BOTH of the above need fixing, IMO. Possibly this means augmenting txaio to deal with multiple event-loops.

Is your example scenario covered if "all the Autobahn WebSockets use the custom event loop you gave to txaio"? (e.g. by setting txaio.config.loop = custom_loop)

meejah avatar Nov 03 '16 20:11 meejah

Not really. The issue for me is that there is a global event loop, which prevents me from running multiple loops in different threads.

Actually my requirements are even more restrictive than that: I cannot even work with a thread-local loop, since the loop might receive new tasks from other threads.

I'm aware, that these are pretty unusual constraints.

soulmerge avatar Nov 04 '16 02:11 soulmerge

Here is my view on this:

  • there are valid, though probably mostly uncommon use cases for multiple loops
  • Twisted made a historic design mistake in not allowing for multiple/different event loops, asyncio is doing it right
  • allowing Twisted to work with multiple event loops (reactors) would probably be a pretty big effort, so I wouldn't hold my breath

In principle there seem to be 3 options of what we could do:

a) remove loop argument in Autobahn factories, use the txaio configured (global) loop (which is fixed for Twisted, and which could be changed for asyncio - but globally) - I think this is what @meejah suggests b) add loop as an argument to all Autobahn factories, also for the Twisted ones, but for Twisted bail out if that is not "the one true reactor" - also hinted at by @meejah c) let Autobahn interfaces for Twisted vs asyncio classes diverge (like today)


a) covers more use cases: it would allow switching the one global event loop, but it does not allow multiple loops concurrently - neither in one and the same, nor in different threads (unless the loop in txaio would be stored in thread local storage - but that seems to be a can of worms).

b) covers all use cases (for asyncio). However, it would trigger a lot of change in Autobahn both in impl. and interfaces - with practically zero gain for Autobahn itself, and for most use cases.

c) is obviously not so cool. we already have a few places diverged (ApplicationRunner etc), and this doesn't feel right - it's a goal conflict: we want to paper over differences in Twisted/asyncio, without rewrapping everything and expose all framework specific extra features. Mission impossible.

In any case, I think this is an important discussion. Right now, I tend to a) ..

oberstet avatar Nov 04 '16 16:11 oberstet

I hadn't really considered a "thread-local" txaio loop (which obviously would just be "the" reactor for Twisted). I wonder: would this cover @soulmerge's use-cases?

Twisted I believe realizes the mistake of a global loop (and recent suggested-practice is to take a "reactor" argument for things that require it) -- but yes, it'll be a looong road to multiple reactors "actually" working in Twisted. I think the real mistake here is having an implicit reactor (which, unfortunately, asyncio also has).

Pragmatically, right now the only public API in Autobahn that accepts an extra loop= argument are the WebSocket factories -- and the merged #750 goes a decent way to at least making this "sort-of work when you squint" vs. txaio etc. (i.e. it fixes the original issue in this ticket). I don't believe anyone has ever asked for this from the WAMP factories.

As a refinement of "c" (basically @oberstet implies this mostly) we could do this: allow a loop= kwarg to txaio.create_future() (on all platforms). On Twisted, you get an exception if loop is not reactor or similar. On asyncio, one of two things could happen: either it's just blindly passed on to the Future() call using any global loop as a backup/default or it checks if there's a global txaio loop configured (and fails if you've configured a loop and passed in a loop). Then, we could selectively add ways to do custom loops for WebSocket (e.g. leave as now, probably) or WAMP protocols (would also take consideration for the Connection class -- which is probably where it make sense to do this for the WAMP side, instead of [or in addition to] loop= args on the WAMP factories).

I'd still like to see actual open-source asyncio code that does really use multiple loops ;)

(Personally, I'd recommend strongly against mixing threads and async code -- especially with WAMP, where you already have good tools to go multi-process which a) avoids confusion and b) gives way better performance). That said, asyncio does give you multiple event-loops out of the box (which is nice) and obviously people are using these even without threads.

meejah avatar Nov 04 '16 17:11 meejah

Personally, I'd recommend strongly against mixing threads and async code -- especially with WAMP

Absolutely. (few exceptions, eg implicit background thread pools to use block database client libraries)

Thread local storage: I was more mentioning this for the sake of completeness .. I would strongly argue against introducing that into txaio. Thread local storage can get tricky, very hard to debug and in particular when combined with event loops, the GIL, etc - so no, pls no;)

As far as I understand @soulmerge , it wouldn't be enough anyway: he wants multiple concurrently running loops in one thread - but I might be wrong here.

As a refinement of "c" (basically @oberstet implies this mostly) we could do this: allow a loop= kwarg to txaio.create_future() (on all platforms). ....

I like that: essentially, first add support for this in txaio (where changes are required anyway), and then selectively see where we use / expose that in Autobahn.

Where else within txaio interfaces would we need to add that? Essentially everything refering a loop/reactor in the implementation needs the additional kwargs.

Minor: should we go into that direction, a trivial, but annoying question: what's the parameter name? loop, reactor, loop_or_reactor, ..?

I'd still like to see actual open-source asyncio code that does really use multiple loops ;)

Yes, me too! Because of two reasons:

  • if we sunk work into this, we should at least make sure it actually covers the use cases supposed to be addressed by the change
  • we have too much to do, and though all this seems desirable from a design point of view, we need to concentrate to where the real gains for common use cases / most users are

oberstet avatar Nov 04 '16 17:11 oberstet

Things that use txaio.config.loop right now are:

  • txaio.create_future
  • txaio.as_future
  • txaio.call_later
  • txaio.make_batched_timer

(The above for the asyncio side). For Twisted, it's a subset of that -- just the call_later and make_batched_timer.

meejah avatar Nov 04 '16 17:11 meejah

In addition to the above, the web-socket base protocol uses make_batched_timer to reduce the number of timer objects we make when running with lots of connections. Currently there is one of these.

This would need to changed to be a per-loop batched-timer object. Twisted can use a single batched-timer (as now) and asyncio would need a per-loop implementation (so, push the batched-timer down to autobahn.{asycnio,twisted}.websocket from the generic class).

meejah avatar Dec 06 '16 00:12 meejah

Just hit this when moving a websocket server from the main thread to a background thread to enable keyboard interrupts to work on Windows. Never heard of txaio prior to this, and was definitely surprised to find its aio.config.loop attribute being used instead of the loop I provided to the factory. In my case there is still only a single loop in use, it just isn't the one from the main thread. Here's the relevant part of the callstack where I discovered what was going wrong:

event_loop.py, line 28, in run_loop loop.run_forever() base_events.py, line 309, in run_forever self._run_once() base_events.py, line 1217, in _run_once handle._run() events.py, line 136, in _run self._callback(*self._args) websocket.py, line 114, in process self._dataReceived(data) protocol.py, line 1174, in _dataReceived self.consumeData() protocol.py, line 1203, in consumeData self.processHandshake() protocol.py, line 2785, in processHandshake f = txaio.as_future(self.onConnect, request) aio.py, line 305, in as_future return create_future_success(res) aio.py, line 285, in create_future_success return create_future(result=result) aio.py, line 276, in create_future f = Future(loop=config.loop)

JoshHeitzman avatar Apr 06 '17 01:04 JoshHeitzman

asyncio is deprecating the loop kwarg (for removal in 3.10) throughout its api. It's also not recommended to call asyncio.Future() or asyncio.get_event_loop():

The rule of thumb is to never expose Future objects in user-facing APIs, and the recommended way to create a Future object is to call loop.create_future()

Because this function has rather complex behavior (especially when custom event loop policies are in use), using the get_running_loop() function is preferred to get_event_loop() in coroutines and callbacks.

Specifically futures should only be created with:

def create_future():
    return asyncio.get_running_loop().create_future()

graingert avatar Mar 09 '21 22:03 graingert

actually, we only want to determine "the default loop" when the user does not explicitly provides one. the other aspect: yeah, we should create future using the factory present on the loop object for that. and creating a future in autobahn should use the txaio function for that.

oberstet avatar Mar 10 '21 08:03 oberstet