Telethon icon indicating copy to clipboard operation
Telethon copied to clipboard

Race condition: `_user_connected` set after send/recv loops start, causing immediate loop exit in Textual's event loop

Open DoubleBoba opened this issue 2 weeks ago • 1 comments

Code that causes the issue

"""
Minimal reproduction of race condition in MTProtoSender when used with Textual.
"""

import logging
from telethon import TelegramClient
from textual.app import App

logging.basicConfig(level=logging.DEBUG, format="%(asctime)s [%(levelname)s] %(name)s: %(message)s")

API_ID = 123
API_HASH = "hash"


class TestApp(App):
    async def on_mount(self) -> None:
        client = TelegramClient("test_race", API_ID, API_HASH)
        try:
            print("Connecting to Telegram...")
            await client.connect()
            print("connect() OK") # Hangs if bug present
            me = await client.get_me()  
            print(f"get_me() returned: {me}")
        finally:
            await client.disconnect()
        self.exit()


if __name__ == "__main__":
    TestApp().run(headless=True)

Expected behavior

When client.connect() is called:

  1. The _send_loop and _recv_loop tasks are created
  2. When these loops start executing, _user_connected should already be True
  3. The loops should continue running and handle RPC requests/responses

Reproduction steps:

  1. Create a Textual TUI app that uses TelegramClient
  2. Call client.connect() inside the Textual app (e.g., in on_mount)
  3. The connection appears to succeed, but RPC calls hang indefinitely

Note: This issue does NOT reproduce with plain asyncio.run(). It specifically occurs with Textual's event loop, which is more aggressive about running scheduled tasks immediately after create_task().

Actual behavior

The race condition in MTProtoSender:

  1. connect() calls await self._connect() (line 133)
  2. Inside _connect(), tasks are created with create_task() (lines 274, 277)
  3. _connect() returns
  4. In Textual's event loop, the scheduled tasks run immediately before control returns to connect()
  5. _send_loop/_recv_loop check while self._user_connected — it's still False!
  6. Loops exit immediately without processing anything
  7. Only then does connect() set _user_connected = True (line 134) — too late

Consequence: RPC calls hang forever because the recv loop has already exited. The connection appears successful, but no responses are ever received.

Traceback

No crash — the issue manifests as RPC calls hanging/timing out because the receive loop silently exited.

Telethon version

1.42.0

Python version

Python 3.14.0

Operating system (including distribution name and version)

macOS 15.7.2 (Sequoia)

Other details

Root cause

The issue is in mtprotosender.py. The _user_connected flag is set in connect() AFTER _connect() returns:

# mtprotosender.py line 127-135
async def connect(self, connection):
    async with self._connect_lock:
        if self._user_connected:
            return False
        self._connection = connection
        await self._connect()           # <-- tasks created here
        self._user_connected = True     # <-- flag set here (too late!)
        return True

But _connect() creates the loop tasks before returning:

# mtprotosender.py line 272-278
loop = helpers.get_running_loop()
self._log.debug('Starting send loop')
self._send_loop_handle = loop.create_task(self._send_loop())

self._log.debug('Starting receive loop')
self._recv_loop_handle = loop.create_task(self._recv_loop())

Why Textual triggers this but asyncio.run() doesn't

In standard asyncio.run(), the task scheduling is more "lazy" — scheduled tasks don't get a chance to run until the calling coroutine explicitly yields (awaits something). By the time this happens, _user_connected = True has already been set.

Textual uses a more sophisticated event loop that's more aggressive about running scheduled tasks. When create_task() is called, Textual's loop can execute the task immediately when control returns from _connect(), before connect() has a chance to set _user_connected = True.

This is valid behavior according to Python's asyncio documentation, which states that tasks begin running "soon" after create_task() — the exact timing depends on the event loop implementation.

Debug logging to observe the race condition

To observe this bug, I added diagnostic logging to mtprotosender.py:

At line 511 (at the start of _recv_loop(), before the while loop):

async def _recv_loop(self):
    """
    This loop is responsible for reading all incoming responses
    from the network, decrypting and handling or dispatching them.

    Besides `connect`, only this method ever receives data.
    """
    # ADD THIS LINE:
    self._log.info(f"Starting receive loop _user_connected={self._user_connected} _reconnecting={self._reconnecting}")
    while self._user_connected and not self._reconnecting:

At line 135 (after _user_connected = True in connect()):

        self._connection = connection
        await self._connect()
        self._user_connected = True
        # ADD THIS LINE:
        self._log.info(f"User connected! _user_connected={self._user_connected} _reconnecting={self._reconnecting}")
        return True

Output when running with Textual (BUG):

2025-12-07 16:30:25,321 [DEBUG] Starting send loop
2025-12-07 16:30:25,321 [DEBUG] Starting receive loop
2025-12-07 16:30:25,321 [INFO] Starting receive loop _user_connected=False _reconnecting=False  <-- BUG!
2025-12-07 16:30:25,321 [INFO] Connection to 149.154.167.51:443/TcpFull complete!
2025-12-07 16:30:25,321 [INFO] User connected! _user_connected=True _reconnecting=False  <-- TOO LATE

Output when running with plain asyncio.run() (OK):

2025-12-07 16:33:02,348 [INFO] Connection to 149.154.167.51:443/TcpFull complete!
2025-12-07 16:33:02,348 [INFO] User connected! _user_connected=True _reconnecting=False  <-- SET FIRST
2025-12-07 16:33:02,350 [INFO] Starting receive loop _user_connected=True _reconnecting=False  <-- CORRECT

Current workaround

Monkey-patch MTProtoSender._connect to set the flag before calling the original:

from telethon.network.mtprotosender import MTProtoSender

original_connect = MTProtoSender._connect

async def patched_connect(self):
    self._user_connected = True
    try:
        await original_connect(self)
    except Exception:
        self._user_connected = False
        raise

MTProtoSender._connect = patched_connect

Checklist

  • [x] The error is in the library's code, and not in my own.
  • [x] I have searched for this issue before posting it and there isn't an open duplicate.
  • [x] I ran pip install -U https://github.com/LonamiWebs/Telethon/archive/v1.zip and triggered the bug in the latest version.

DoubleBoba avatar Dec 07 '25 15:12 DoubleBoba

Thanks for the detailed report. As you seem to have it all worked out, would you like to submit a PR? Otherwise I'm not sure when I'll get around to fixing it.

Lonami avatar Dec 08 '25 17:12 Lonami