tornado icon indicating copy to clipboard operation
tornado copied to clipboard

Support for free-threaded CPython

Open m-clare opened this issue 7 months ago • 16 comments

I'm filing this issue as a placeholder for potential work around supporting free-threaded (v3.13t +) Python.

The standard TODOs for adding free-threading support are:

  • [ ] Audit Python bindings and declare them free-threading compatible (xref https://py-free-threading.github.io/porting/#updating-extension-modules).
  • [ ] Run the test suite with pytest-run-parallel to find potential issues, and fix them.
  • [ ] Run the test suite under ThreadSanitizer. If possible, depends on how many dependencies there are and if they run under TSan.
  • [ ] Add cp313t-* to CI to build free-threading wheels.

For more details, please see the suggested plan of attack in the py-free-threading guide.

Note that this is the first time I've looked at this repo, so I might be missing known issues or code that needs closer inspection. Any suggestions here will be very useful.

m-clare avatar May 20 '25 14:05 m-clare

Tornado builds fine on the free-threaded build. I think @lysnikolaou looked at this a little already and has some thoughts but I'll let him clarify.

ngoldbaum avatar May 20 '25 14:05 ngoldbaum

Thanks @m-clare for opening this issue. It's nice to see a tracker issue, since there's a couple of things I wanted to write down, but I hadn't gotten around to opening one. I've been working on building free-threaded wheels and testing the free-threaded build on CI in #3494, but I didn't get too far.

There's test failures under the free-threaded build on Ubuntu and Windows, which one can examine more closely on the Github actions run against my fork. I didn't dive too deep into these, but I have ideantified at least three things that might warrant a closer look:

  • There are some warnings emitted under 3.13t that are not emitted under 3.13. This might be an issue in the warnings module itself (we know that that isn't thread safe), but it also might be multiple threads trying to attach to the same socket and that emitting a warning. This makes some tests fail, since they examine stdout/stderr to verify that nothing was logged.
  • There are some logs that look like what's reported here. I don't know what the issue here is, but this might need someone to dive deeper.
  • There's at least one place where a class attribute is used, which might lead to thread safety issues. AFAIU that class attribute is a dictionary and the class namespace itself does not change, which makes this less of a problem, but mutating that dictionary from multiple threads will lead to unexpected behavior. Maybe @bdarnell has more context on this?

lysnikolaou avatar May 26 '25 16:05 lysnikolaou

@lysnikolaou thanks for dropping this here. When I looked through the open issues/PRs, I missed that you'd titled it with 313t!

m-clare avatar May 27 '25 14:05 m-clare

Looks like the only error on ubuntu is the musllinux one which doesn't look like our fault. That's good, but not too surprising because our use of threads is minimal except on windows.

The warnings are RuntimeWarning: coroutine 'SelectorThread.__init__.<locals>.thread_manager_anext' was never awaited and Task was destroyed but it is pending! (for the same coroutine). This is only used on windows and I remember it being fairly subtle to get rid of those warnings in standard builds. It's sensitive to the behavior of GC and atexit callbacks. Skipping test_destructor_log for free-threaded builds on windows may be appropriate .

We do use class attributes and I've been careful about their thread-safety, although this is under-tested since we don't actually use threads much. Sometimes we use mutexes but more often we rely on the atomicity of dict operations. Doesn't the internal lock of dict make this as safe as it was before? Or do you have specific concerns here?

bdarnell avatar May 27 '25 14:05 bdarnell

Looks like the only error on ubuntu is the musllinux one which doesn't look like our fault. That's good, but not too surprising because our use of threads is minimal except on windows.

I have merged the fix for the musllinux issue upstream and backported to 3.13 and 3.14 https://github.com/python/cpython/issues/130522

We do use class attributes and I've been careful about their thread-safety, although this is under-tested since we don't actually use threads much. Sometimes we use mutexes but more often we rely on the atomicity of dict operations. Doesn't the internal lock of dict make this as safe as it was before? Or do you have specific concerns here?

Yes, it should be fine as dicts are thread safe.

kumaraditya303 avatar Aug 03 '25 06:08 kumaraditya303

I had another look at #3494 today since the musllinux issue was fixed. CI is green, however cibuildwheel fails on Windows with a test failure. The traceback looks like this:

  Traceback (most recent call last):
    File "C:\Users\runneradmin\AppData\Local\Temp\cibw-run-peuew8qd\cp313t-win32\venv-test\Lib\site-packages\tornado\httpclient.py", line 113, in __del__
    File "C:\Users\runneradmin\AppData\Local\Temp\cibw-run-peuew8qd\cp313t-win32\venv-test\Lib\site-packages\tornado\httpclient.py", line 118, in close
    File "C:\Users\runneradmin\AppData\Local\Temp\cibw-run-peuew8qd\cp313t-win32\venv-test\Lib\site-packages\tornado\simple_httpclient.py", line 157, in close
    File "C:\Users\runneradmin\AppData\Local\Temp\cibw-run-peuew8qd\cp313t-win32\venv-test\Lib\site-packages\tornado\httpclient.py", line 240, in close
    File "C:\Users\runneradmin\AppData\Local\pypa\cibuildwheel\Cache\nuget-cpython\pythonx86-freethreaded.3.13.3\tools\Lib\weakref.py", line 505, in pop
  TypeError: 'NoneType' object is not callable

Could this be another CPython bug @kumaraditya303?

lysnikolaou avatar Aug 04 '25 10:08 lysnikolaou

Could this be another CPython bug

That seems like a crash in weakref.WeakKeyDictionary which is not thread safe on 3.13, I had fixed the thread safety of it in 3.14. Does it happen on 3.14t too?

kumaraditya303 avatar Aug 04 '25 10:08 kumaraditya303

No, it doesn't happen under 3.14t, so it looks like that's okay. Do you have any suggestions for possible alternatives under 3.13? Also, under 3.14 there seems to be issues with warnings control. An ignore filter is not going though under all platforms.

HTTPServerRequest(protocol='http', host='127.0.0.1:40987', method='GET', uri='/set_deprecated', version='HTTP/1.1', remote_ip='127.0.0.1')
      Traceback (most recent call last):
        File "/tmp/tmp.n06jzX6s47/venv/lib/python3.14t/site-packages/tornado/web.py", line 1846, in _execute
          result = method(*self.path_args, **self.path_kwargs)
        File "/tmp/tmp.n06jzX6s47/venv/lib/python3.14t/site-packages/tornado/test/web_test.py", line 326, in get
          self.set_cookie("a", "b", HttpOnly=True, pATH="/foo")
          ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/tmp/tmp.n06jzX6s47/venv/lib/python3.14t/site-packages/tornado/web.py", line 724, in set_cookie
          warnings.warn(
          ~~~~~~~~~~~~~^
              f"Deprecated arguments to set_cookie: {set(kwargs.keys())} "
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
              "(should be lowercase)",
              ^^^^^^^^^^^^^^^^^^^^^^^^
              DeprecationWarning,
              ^^^^^^^^^^^^^^^^^^^
          )
          ^
      DeprecationWarning: Deprecated arguments to set_cookie: {'pATH', 'HttpOnly'} (should be lowercase)

lysnikolaou avatar Aug 04 '25 12:08 lysnikolaou

That looks related to the new context_aware_warnings flag. It looks like the context is now being captured earlier than expected. I'm a little surprised that this is the only problem we have with this change. I'm not sure exactly what needs to change in this test to fix the plumbing - we'll probably have to widen the scope of the warning filter.

bdarnell avatar Aug 05 '25 17:08 bdarnell

Yes, we've confirmed that this is related to the new context_aware_flag, since this is happening under the GIL-enabled build as well if that flag is set to true. This seems to be related to run_sync which is used in the tests and how the AsyncIOLoop wraps the actual asyncio event loop. Context is somehow not propagated correctly. @kumaraditya303 has ideas for what might be wrong.

Having said that, do you think this is a blocker for building free threaded wheels? It seems to me that we could disable this test and start building wheels, before we come back to it in a follow-up PR.

lysnikolaou avatar Aug 06 '25 08:08 lysnikolaou

Context is somehow not propagated correctly.

It's not that context is propagating incorrectly, it's that modifying a contextvar is semantically different from modifying global variables. We just need to modify the test so it works with either semantics.

Having said that, do you think this is a blocker for building free threaded wheels?

This is a test-only issue (in an unimportant test), so it's ok to skip this test (or set context_aware_warnings to false?) and enable the builds.

bdarnell avatar Aug 06 '25 13:08 bdarnell

It's not that context is propagating incorrectly, it's that modifying a contextvar is semantically different from modifying global variables. We just need to modify the test so it works with either semantics.

As far as I understand, this should be opaque to user code and catch_warnings, simplefilter and friends should continue to behave the same (at least in single-threaded code). So my feeling is that the test is okay as is and this is a bug somewhere deeper in AsyncIOLoop.

This is a test-only issue (in an unimportant test), so it's ok to skip this test (or set context_aware_warnings to false?) and enable the builds.

Opened #3525.

lysnikolaou avatar Aug 06 '25 13:08 lysnikolaou

As far as I understand, this should be opaque to user code and catch_warnings, simplefilter and friends should continue to behave the same (at least in single-threaded code). So my feeling is that the test is okay as is and this is a bug somewhere deeper in AsyncIOLoop.

The problem is that the server task (and associated event loop) are created in setUp, capturing that context. We then change the warning filter in the test method, which by this point is running in a different context. Everything worked when the warning filter was just a global, but now it doesn't because the contexts are separate. Contexts are doing what they're supposed to do, we just need to set the warning filter in a different place.

bdarnell avatar Aug 07 '25 21:08 bdarnell

@bdarnell Do you expect to cut a new release and upload free-threaded wheels to PyPI soon?

lysnikolaou avatar Oct 03 '25 14:10 lysnikolaou

No, I don't have any near-term plans for the next release. The existing release works fine for free-threading, right? You just have to build the extension from source.

bdarnell avatar Oct 03 '25 18:10 bdarnell

Understood.

The existing release works fine for free-threading, right? You just have to build the extension from source.

That's correct, yes.

lysnikolaou avatar Oct 03 '25 18:10 lysnikolaou