autobahn-python icon indicating copy to clipboard operation
autobahn-python copied to clipboard

Asyncio protocol connection_lost handler causes infinite reconnect loop

Open siku2 opened this issue 6 years ago • 0 comments

I found myself unable to properly stop an asyncio WAMP component.

The following snippet written in Python 3.7 (but it should also run in 3.6) should demonstrate the issue:

import asyncio

import logging
import txaio
from autobahn.asyncio.component import Component


async def main():
    txaio.config.loop = asyncio.get_event_loop()
    log = logging.getLogger("main")

    component = Component(
        transports=[{
            "type": "rawsocket",
            "url": "rs://localhost:8333",   # ADJUST FOR ROUTER
        }],
        realm="internal",                   # ADJUST FOR ROUTER
    )

    @component.on_ready
    async def on_ready(*_):
        log.info("CONNECTED, STOPPING")
        await component.stop()

    log.info("starting")
    await component.start()
    log.info("done") # never reached


txaio.start_logging(level="trace")
asyncio.run(main())

Expected Behaviour

According to the documentation this should connect to the router, join the session and then immediately leave the session and stop the component.

Current Behaviour

Starting the component works as expected, but as soon as component.stop is called a TransportLost exception is raised which causes the component to enter the re-connect loop and finally reconnect. This cycle is then repeated forever.

Log

2019-07-30T16:41:01 Using selector: SelectSelector
2019-07-30T16:41:01 starting
2019-07-30T16:41:01 Using default loop
2019-07-30T16:41:01 Entering re-connect loop
2019-07-30T16:41:01 trying transport 0 using connect delay 0
2019-07-30T16:41:01 connecting once using transport type "rawsocket" over endpoint "tcp"
2019-07-30T16:41:01 RawSocker Asyncio: Connection made with peer tcp6:::1:8333
2019-07-30T16:41:01 RawSocket Asyncio: Client handshake sent
2019-07-30T16:41:01 RawSocker Asyncio: data received 7ff30000
2019-07-30T16:41:01 WampRawSocketProtocol: Handshake complete
2019-07-30T16:41:01 ApplicationSession started.
2019-07-30T16:41:01 WampRawSocketProtocol: TX WAMP message: Hello(realm=internal, roles={'subscriber': subscriber(publisher_identification=True, pattern_based_subscription=True, subscription_revocation=True, payload_transparency=True, payload_encryption_cryptobox=True), 'publisher': publisher(publisher_identification=True, subscriber_blackwhite_listing=True, publisher_exclusion=True, payload_transparency=True, x_acknowledged_event_delivery=True, payload_encryption_cryptobox=True), 'caller': caller(caller_identification=True, call_canceling=True, progressive_call_results=True, payload_transparency=True, payload_encryption_cryptobox=True), 'callee': callee(caller_identification=True, pattern_based_registration=True, shared_registration=True, call_canceling=True, progressive_call_results=True, registration_revocation=True, payload_transparency=True, payload_encryption_cryptobox=True)}, authmethods=None, authid=None, authrole=None, authextra=None, resumable=None, resume_session=None, resume_token=None)
2019-07-30T16:41:01 WampRawSocketProtocol: TX octets: <removed>
2019-07-30T16:41:01 RawSocker Asyncio: data received <removed>
2019-07-30T16:41:01 WampRawSocketProtocol: RX octets: <removed>
2019-07-30T16:41:01 WampRawSocketProtocol: RX WAMP message: Welcome(session=6981984150612075, roles={'broker': broker(publisher_identification=True, pattern_based_subscription=True, session_meta_api=True, subscription_meta_api=True, subscriber_blackwhite_listing=True, publisher_exclusion=True, subscription_revocation=True, payload_transparency=True, payload_encryption_cryptobox=True, event_retention=True), 'dealer': dealer(caller_identification=True, pattern_based_registration=True, session_meta_api=True, registration_meta_api=True, shared_registration=True, call_canceling=True, progressive_call_results=True, registration_revocation=True, payload_transparency=True, testament_meta_api=True, payload_encryption_cryptobox=True)}, realm=internal, authid=5MF5-TAYT-5TA3-5MRK-SMAP-LJ7E, authrole=anonymous, authmethod=anonymous, authprovider=static, authextra={'x_cb_node_id': None, 'x_cb_peer': 'tcp4:172.18.0.1:38658', 'x_cb_pid': 11}, resumed=None, resumable=None, resume_token=None)

2019-07-30T16:41:01 CONNECTED, STOPPING

2019-07-30T16:41:01 WampRawSocketProtocol: TX WAMP message: Goodbye(message=None, reason=wamp.close.normal, resumable=None)
2019-07-30T16:41:01 WampRawSocketProtocol: TX octets: <removed>
2019-07-30T16:41:01 RawSocker Asyncio: data received <removed>
2019-07-30T16:41:01 WampRawSocketProtocol: RX octets: <removed>
2019-07-30T16:41:01 WampRawSocketProtocol: RX WAMP message: Goodbye(message=None, reason=wamp.close.normal, resumable=None)
2019-07-30T16:41:01 RawSocker Asyncio: Connection lost
2019-07-30T16:41:01 session leaving 'wamp.close.normal'
2019-07-30T16:41:01 session on_disconnect: was_clean=True
2019-07-30T16:41:01 component failed: TransportLost: failed to complete connection
2019-07-30T16:41:01 Traceback (most recent call last):
  File "lib\site-packages\txaio\aio.py", line 514, in done
    res = f.result()
  File "lib\site-packages\txaio\aio.py", line 514, in done
    res = f.result()
autobahn.wamp.exception.TransportLost: failed to complete connection

2019-07-30T16:41:01 Connection failed: TransportLost: failed to complete connection
2019-07-30T16:41:01 Entering re-connect loop
2019-07-30T16:41:01 trying transport 0 using connect delay 2.3582158550393135
2019-07-30T16:41:04 connecting once using transport type "rawsocket" over endpoint "tcp"
2019-07-30T16:41:04 RawSocker Asyncio: Connection made with peer tcp6:::1:8333
2019-07-30T16:41:04 RawSocket Asyncio: Client handshake sent
2019-07-30T16:41:04 RawSocker Asyncio: data received 7ff30000
2019-07-30T16:41:04 WampRawSocketProtocol: Handshake complete
2019-07-30T16:41:04 ApplicationSession started.

Rinse and repeat.

Observations

Because the error is propagated to attempt_connect.notify_connect_error the issue can be "fixed" by treating the TransportLost exception as fatal (using the is_fatal predicate). However this treats "real transport lost" events as fatal too, which is less than ideal.

When the component is starting the protocol's connection_lost is wrapped by the _wrap_connection_future method.

This wrapper detects when the underlying transport is closed and rejects the done future with the TransportLost exception if the future isn't already done.

The problem is that this handler is called before the create_session.on_leave is triggered (which is responsible for resolving the done future normally). The done future receives the TransportLost exception before it ever gets the chance to be resolved normally.

Because the future didn't resolve but instead contains an exception, the attempt_connect.connect_error callback is triggered which treats the exception as if it were an exception caught during connecting, not disconnecting. This then triggers the re-connect logic.

The logs make this especially confusing because the on_leave and on_disconnect handlers' logs assume that the future isn't done yet.

I'm not (yet) familiar enough with the code base to suggest a proper solution/fix but I suspect it might be enough to change the order of events in the message.Goodbye branch of the session's onMessage method. Currently the onLeave method is called first (which is responsible for closing the underlying transport) and the "leave" event is triggered afterward. If instead the "leave" event was triggered first, the done future would resolve and when the connection is closed later on the connection_lost handler would no longer set the exception as the future is already complete.

Versions

  • autobahn: 19.7.2
  • router: Crossbar 19.6.1

siku2 avatar Jul 30 '19 15:07 siku2