Support for free-threaded CPython
I'm filing this issue as a placeholder for potential work around supporting free-threaded (v3.13t +) Python.
The standard TODOs for adding free-threading support are:
- [ ] Audit Python bindings and declare them free-threading compatible (xref https://py-free-threading.github.io/porting/#updating-extension-modules).
- [ ] Run the test suite with
pytest-run-parallelto find potential issues, and fix them. - [ ] Run the test suite under ThreadSanitizer. If possible, depends on how many dependencies there are and if they run under TSan.
- [ ] Add
cp313t-*to CI to build free-threading wheels.
For more details, please see the suggested plan of attack in the py-free-threading guide.
Note that this is the first time I've looked at this repo, so I might be missing known issues or code that needs closer inspection. Any suggestions here will be very useful.
Tornado builds fine on the free-threaded build. I think @lysnikolaou looked at this a little already and has some thoughts but I'll let him clarify.
Thanks @m-clare for opening this issue. It's nice to see a tracker issue, since there's a couple of things I wanted to write down, but I hadn't gotten around to opening one. I've been working on building free-threaded wheels and testing the free-threaded build on CI in #3494, but I didn't get too far.
There's test failures under the free-threaded build on Ubuntu and Windows, which one can examine more closely on the Github actions run against my fork. I didn't dive too deep into these, but I have ideantified at least three things that might warrant a closer look:
- There are some warnings emitted under 3.13t that are not emitted under 3.13. This might be an issue in the
warningsmodule itself (we know that that isn't thread safe), but it also might be multiple threads trying to attach to the same socket and that emitting a warning. This makes some tests fail, since they examinestdout/stderrto verify that nothing was logged. - There are some logs that look like what's reported here. I don't know what the issue here is, but this might need someone to dive deeper.
- There's at least one place where a class attribute is used, which might lead to thread safety issues. AFAIU that class attribute is a dictionary and the class namespace itself does not change, which makes this less of a problem, but mutating that dictionary from multiple threads will lead to unexpected behavior. Maybe @bdarnell has more context on this?
@lysnikolaou thanks for dropping this here. When I looked through the open issues/PRs, I missed that you'd titled it with 313t!
Looks like the only error on ubuntu is the musllinux one which doesn't look like our fault. That's good, but not too surprising because our use of threads is minimal except on windows.
The warnings are RuntimeWarning: coroutine 'SelectorThread.__init__.<locals>.thread_manager_anext' was never awaited and Task was destroyed but it is pending! (for the same coroutine). This is only used on windows and I remember it being fairly subtle to get rid of those warnings in standard builds. It's sensitive to the behavior of GC and atexit callbacks. Skipping test_destructor_log for free-threaded builds on windows may be appropriate .
We do use class attributes and I've been careful about their thread-safety, although this is under-tested since we don't actually use threads much. Sometimes we use mutexes but more often we rely on the atomicity of dict operations. Doesn't the internal lock of dict make this as safe as it was before? Or do you have specific concerns here?
Looks like the only error on ubuntu is the musllinux one which doesn't look like our fault. That's good, but not too surprising because our use of threads is minimal except on windows.
I have merged the fix for the musllinux issue upstream and backported to 3.13 and 3.14 https://github.com/python/cpython/issues/130522
We do use class attributes and I've been careful about their thread-safety, although this is under-tested since we don't actually use threads much. Sometimes we use mutexes but more often we rely on the atomicity of dict operations. Doesn't the internal lock of dict make this as safe as it was before? Or do you have specific concerns here?
Yes, it should be fine as dicts are thread safe.
I had another look at #3494 today since the musllinux issue was fixed. CI is green, however cibuildwheel fails on Windows with a test failure. The traceback looks like this:
Traceback (most recent call last):
File "C:\Users\runneradmin\AppData\Local\Temp\cibw-run-peuew8qd\cp313t-win32\venv-test\Lib\site-packages\tornado\httpclient.py", line 113, in __del__
File "C:\Users\runneradmin\AppData\Local\Temp\cibw-run-peuew8qd\cp313t-win32\venv-test\Lib\site-packages\tornado\httpclient.py", line 118, in close
File "C:\Users\runneradmin\AppData\Local\Temp\cibw-run-peuew8qd\cp313t-win32\venv-test\Lib\site-packages\tornado\simple_httpclient.py", line 157, in close
File "C:\Users\runneradmin\AppData\Local\Temp\cibw-run-peuew8qd\cp313t-win32\venv-test\Lib\site-packages\tornado\httpclient.py", line 240, in close
File "C:\Users\runneradmin\AppData\Local\pypa\cibuildwheel\Cache\nuget-cpython\pythonx86-freethreaded.3.13.3\tools\Lib\weakref.py", line 505, in pop
TypeError: 'NoneType' object is not callable
Could this be another CPython bug @kumaraditya303?
Could this be another CPython bug
That seems like a crash in weakref.WeakKeyDictionary which is not thread safe on 3.13, I had fixed the thread safety of it in 3.14. Does it happen on 3.14t too?
No, it doesn't happen under 3.14t, so it looks like that's okay. Do you have any suggestions for possible alternatives under 3.13? Also, under 3.14 there seems to be issues with warnings control. An ignore filter is not going though under all platforms.
HTTPServerRequest(protocol='http', host='127.0.0.1:40987', method='GET', uri='/set_deprecated', version='HTTP/1.1', remote_ip='127.0.0.1')
Traceback (most recent call last):
File "/tmp/tmp.n06jzX6s47/venv/lib/python3.14t/site-packages/tornado/web.py", line 1846, in _execute
result = method(*self.path_args, **self.path_kwargs)
File "/tmp/tmp.n06jzX6s47/venv/lib/python3.14t/site-packages/tornado/test/web_test.py", line 326, in get
self.set_cookie("a", "b", HttpOnly=True, pATH="/foo")
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/tmp/tmp.n06jzX6s47/venv/lib/python3.14t/site-packages/tornado/web.py", line 724, in set_cookie
warnings.warn(
~~~~~~~~~~~~~^
f"Deprecated arguments to set_cookie: {set(kwargs.keys())} "
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
"(should be lowercase)",
^^^^^^^^^^^^^^^^^^^^^^^^
DeprecationWarning,
^^^^^^^^^^^^^^^^^^^
)
^
DeprecationWarning: Deprecated arguments to set_cookie: {'pATH', 'HttpOnly'} (should be lowercase)
That looks related to the new context_aware_warnings flag. It looks like the context is now being captured earlier than expected. I'm a little surprised that this is the only problem we have with this change. I'm not sure exactly what needs to change in this test to fix the plumbing - we'll probably have to widen the scope of the warning filter.
Yes, we've confirmed that this is related to the new context_aware_flag, since this is happening under the GIL-enabled build as well if that flag is set to true. This seems to be related to run_sync which is used in the tests and how the AsyncIOLoop wraps the actual asyncio event loop. Context is somehow not propagated correctly. @kumaraditya303 has ideas for what might be wrong.
Having said that, do you think this is a blocker for building free threaded wheels? It seems to me that we could disable this test and start building wheels, before we come back to it in a follow-up PR.
Context is somehow not propagated correctly.
It's not that context is propagating incorrectly, it's that modifying a contextvar is semantically different from modifying global variables. We just need to modify the test so it works with either semantics.
Having said that, do you think this is a blocker for building free threaded wheels?
This is a test-only issue (in an unimportant test), so it's ok to skip this test (or set context_aware_warnings to false?) and enable the builds.
It's not that context is propagating incorrectly, it's that modifying a contextvar is semantically different from modifying global variables. We just need to modify the test so it works with either semantics.
As far as I understand, this should be opaque to user code and catch_warnings, simplefilter and friends should continue to behave the same (at least in single-threaded code). So my feeling is that the test is okay as is and this is a bug somewhere deeper in AsyncIOLoop.
This is a test-only issue (in an unimportant test), so it's ok to skip this test (or set context_aware_warnings to false?) and enable the builds.
Opened #3525.
As far as I understand, this should be opaque to user code and catch_warnings, simplefilter and friends should continue to behave the same (at least in single-threaded code). So my feeling is that the test is okay as is and this is a bug somewhere deeper in AsyncIOLoop.
The problem is that the server task (and associated event loop) are created in setUp, capturing that context. We then change the warning filter in the test method, which by this point is running in a different context. Everything worked when the warning filter was just a global, but now it doesn't because the contexts are separate. Contexts are doing what they're supposed to do, we just need to set the warning filter in a different place.
@bdarnell Do you expect to cut a new release and upload free-threaded wheels to PyPI soon?
No, I don't have any near-term plans for the next release. The existing release works fine for free-threading, right? You just have to build the extension from source.
Understood.
The existing release works fine for free-threading, right? You just have to build the extension from source.
That's correct, yes.