free-threaded-compatibility icon indicating copy to clipboard operation
free-threaded-compatibility copied to clipboard

Ecosystem-wide thread safety documentation standards

Open ngoldbaum opened this issue 3 months ago • 10 comments

CPython itself and the ecosystem as a whole need better documentation on thread safety guarantees. In the acceptance for PEP 779, the Python steering council wanted the following documentation tasks addressed to move free-threading to the next phase of support:

Documentation expectations

Documentation must be clearly written and maintained.

  • For Python users: What guarantees exist, and how are they affected by the free-threaded build on all APIs in all modules of the standard library.
  • For both Python and C API developers: Documentation on signal-safety, thread-safety, and other concurrency-related guarantees in all APIs that are publicly exposed without exceptions.
  • For CPython developers: Documentation on the impact of free-threading and how it should be taken into consideration while working on the language implementation. We recommend a central “free threading landing page”, location to be decided, which provides a guide to all the disparate documentation, PEPs, timelines, other decisions, and information regarding the free threading feature in Python. If https://py-free-threading.github.io/ is that site, we recommend making it an official page and improving its discoverability and visibility (e.g. possibly moving to the python.org domain).

To accomplish this, we need to look at addressing thread safety documentation in both CPython itself and the ecosystem. It's likely that what's needed for CPython will look different than what's needed for the ecosystem, but we should probably consider both simultaneously because there will likely be a lot of overlap in needs and approaches.

For the ecosystem:

  • Investigate adding thread safety as a "standard" section or admonition in API docs. Explore special markup in the numpydoc sphinx extensions, docutils, and sphinx itself.
  • Survey thread safety documentation in a range of popular projects.
  • Write up suggestions and best practices for documenting thread safety.
  • Expand thread safety docs as needed.

For CPython:

  • Add infrastructure and an outline for documentation on thread safety of the Python and C APIs.
  • Establish what the thread safety guarantees are.
  • Figure out how to write down guarantees without hamstringing future improvements.
  • Figure out how to document modules that are intentionally not thread-safe or have not yet been made thread-safe.

ngoldbaum avatar Sep 26 '25 16:09 ngoldbaum

Last week, we sat down with @ngoldbaum and @willingc to come up with a proposal on how to move forward with this. We've been focusing on addressing the SC's documentation requirements for free-threading and providing a collection of good resources while allowing for easy navigation and discoverability. We currently have py-free-threading which has been valuable for ecosystem tracking and early documentation, but based on the SC's acceptance of PEP 779, we need to evolve this into an official documentation site that serves as the central landing page they requested.

Why evolve beyond py-free-threading.github.io?

The SC specifically mentioned that if py-free-threading.github.io is to be the central site, it should become official and move to the python.org domain. In general, we think that the official documentation should include the following:

  • CPython's own thread-safety guarantees (builtins, stdlib, C APIs)
  • Installation across all distribution channels
  • Terminology and parallel programming primers
  • Performance considerations
  • Clear persona-based navigation

Why not integrate into the main CPython docs?

  • We're using mkdocs for this site, giving us flexibility to iterate quickly without being bound to Sphinx/RST
  • Free-threading documentation needs to move faster than the main CPython docs release cycle allows
  • The broader scope (ecosystem-wide concerns, installation across many channels, persona-based navigation) fits better as a standalone site
  • We're following the pattern of other focused Python documentation sites (packaging.python.org, devguide.python.org, etc.)

Proposed structure

We've drafted an outline that would incorporate and expand on existing py-free-threading content, organized around user personas:

1. High-level overview (landing page)

  • What is free-threading and why is it exciting?
  • Persona-based navigation (see below)
  • Links to PEPs, talks, and resources

2. Installation

  • Official binaries, building from source
  • Third-party: Linux distros, Homebrew, pyenv, uv/pixi, conda-forge, Jupyter
  • (Migrates/expands current installation docs)

3. Terminology

  • Thread-safety, race conditions, atomicity, sequential consistency
  • Mutable vs immutable types, global state, deadlocks
  • (New content akin to the Python glossary)

4. Guarantees of various builtins

  • Container types (atomicity guarantees and exceptions)
  • Iterator types, built-in functions
  • Import system and per-module import locks, GC behavior
  • (New CPython-specific content)

5. Guarantees in the standard library

  • Per-module impact and guarantees
  • Special cases (ctypes, gc, debugging/profiling tools)
  • (New CPython-specific content)

6. C APIs

  • Thread-safety guarantees for public C APIs
  • Stable ABI considerations
  • Critical sections and lock usage patterns
  • ABI considerations, building extensions
  • (New CPython-specific content)

7. Performance

  • Reference counting contention, interpreter overhead
  • Parallel programming best practices
  • (Expands on existing content)

8. Testing

  • Setting up CI for free-threading
  • Testing strategies for thread-safety
  • Common testing pitfalls
  • (Expands on existing content)

Target personas

In the landing page, we could direct each persona to the documentation section that's most relevant for them (like we've been doing on py-free-threading). We're thinking of the following personas:

  • Python application developers using third-party libraries
  • Pure Python package maintainers (no public API)
  • Pure Python package maintainers (with public API)
  • C extension authors

Open questions and feedback

  1. Hosting: We propose free-threading.python.org (open to bikeshedding the subdomain)

    • The SC explicitly suggested moving to the python.org domain for visibility and discoverability
    • Keeps it conceptually separate from main CPython docs (it's broader in scope)
    • Easier to remember and share than a subdirectory path
    • Follows pattern of other Python community sites (packaging.python.org, etc.)
  2. Repository: We propose a new repository under the python GitHub organization (e.g., python/free-threading-docs)

    • py-free-threading.github.io is currently under a personal/org namespace - moving to python org makes it official
    • Can still use python.org infrastructure
  3. Scope: We propose focusing on thread-safety guarantees for now while already planning for expanding on how parallel programming interacts with the free-threaded build in the future

    • Primary focus should be thread-safety guarantees (SC requirement)
    • Don't reinvent parallel programming tutorials - reference existing resources
    • Expand on how standard parallel programming patterns interact specifically with CPython's free-threaded build later
  4. Content: Gather more feedback

    • Get feedback on the proposed outline
    • Align on whether this addresses the SC's concerns
    • Are there missing personas or topics? Something we missed altogether?

cc @Yhg1s

lysnikolaou avatar Oct 14 '25 18:10 lysnikolaou

Thanks for getting the ball rolling here @lysnikolaou!

This proposal wasn't quite what I was expecting after our in-person discussions and after gh-249. I think there are two separate questions:

  1. How to do ecosystem-wide docs (likely answered by making the py-free-threading site "more official" indeed)
  2. Improvements needed to the CPython docs themselves (@Yhg1s, @willingc and @AA-Turner were all focused on this part)

Over time, I'd imagine that a significant part of the content in (1) would slowly migrate to (2). I imagine that the end state - when free-threading becomes the default - will require that at least some tutorials, narrative content and API docs are integrated in the CPython docs and not kept separate.

The proposed structure 1-8 here is a mix. Some parts clearly belong in the separate site, however topics like "Guarantees of various builtins", "Guarantees in the standard library" and "C APIs" seem to be squarely in scope for the CPython docs.

Let's discuss how to update this plan to talk about the final state first, what exactly will be needed, what goes where etc. And then after that, what are the intermediate states and how do we go about making updates to move from current to intermediate to final state.

rgommers avatar Oct 15 '25 11:10 rgommers

One specific thing that will need spelling out in a lot more detail is principles of what to document in standard library docs. From what I understood, one of the main concerns from the CPython docs team was that almost all functions/modules would get a ton of thread safety notes, which would be way too much detail and work at this point. Such notes should not be needed for the vast majority of functionality, but it is needed when some object/function deviates from the default behavior that's expected.

rgommers avatar Oct 15 '25 11:10 rgommers

A couple of notes started before @rgommers replied:

On Installation instructions

I want to be careful about "Installation across all distribution channels", I find current page is already quite heavy; if it's targeted toward a newcomer, then likely they cannot decide which method to use. If it's targeted toward an advanced user, then they likely already know how to do that.

In addition; as a guide that can be read the multiple pages sections that describe each and every alternative are often not useful.

I would much prefer focusing on 1 install method, the same docs.python.org, and link to alternate articles/docs that can be maintained by respective communities, likes https://docs.python.org/3/using/unix.html#on-linux docs: "If you use X, see x.org/free-threading-install/, if you use y, see y.org/...", this release us from many PR reviews, and judging instructions for platforms we might not be familiar with, and any stale content and issues.

On order of sections,

I'd like to have "testing" earlier, in the vein of "make it work make it right make it fast" (in that order), I think it's important to tackle the correctness first, there will in particular be maintainers that will come to the guide to first publish a wheel that work for their users (ant not re-enable the gil) before looking for performance improvements.

On Personas

Which categories do you consider a data scientist using say Jupyter, "Python application developers using third-party libraries" ?

Misc

Free-threaded vs/with Async, parallelism vs concurrency. The addition of async/await in Python had a bunch of "just use thread" and "concurrency and parallelism" are the same comments.

Even if this is a separate topic, maybe a section on the fact that maintainers should document free-threading guaranties of their package. And also a bit on packaging (what currently is https://py-free-threading.github.io/porting/#define-and-document-thread-safety-guarantees)

EDIT: Also building with Sphinx to have interspinx targets for all the scientific Python projects using sphinx, would be great.

Carreau avatar Oct 15 '25 11:10 Carreau

Not sure where it fits in the global picture, but this document needs an update: https://docs.python.org/3/faq/library.html#what-kinds-of-global-value-mutation-are-thread-safe. It still says the GIL synchronizes multithreaded access and doesn't discuss the free-threaded build. It would be nice to write something that doesn't assume the GIL is there to provide synchronization and that will be correct on the free-threaded and GIL-enabled build.

ngoldbaum avatar Oct 15 '25 17:10 ngoldbaum

The proposed structure 1-8 here is a mix. Some parts clearly belong in the separate site, however topics like "Guarantees of various builtins", "Guarantees in the standard library" and "C APIs" seem to be squarely in scope for the CPython docs.

@rgommers When Lysandros, Nathan, and I spoke, I stated that I really wanted to see a lot more detail about documentation needs before migrating pages to the CPython docs. I had a similar discussion with Thomas at the Core Python sprint.

In general, I believe that a Minimal Viable Outline needs to be developed by folks driving free-threading before we change a bunch of CPython docs. It's still not clear to me what would be most beneficial to document for CPython users across several personas:

  • those who want high performance from free-threading
  • those who don't want to use free-threading today
  • those who want to try free-threading with their threading and concurrent code that exists in their projects.

willingc avatar Oct 15 '25 21:10 willingc

https://docs.python.org/3/howto/free-threading-python.html#single-threaded-performance also mention future performance of 3.14t, I think 3.14 is out, so this could be updated.

Carreau avatar Oct 16 '25 08:10 Carreau

Hello 👋

I recently responded to a question on Python's discourse that I think might be relevant for this work: https://discuss.python.org/t/thread-safety-now-and-in-the-future-no-gil/104297/73

Essentially, Python doesn't have a memory model (or we could say that it has an implicit sequential model). People coming from other languages may be used to evaluate whether their code is thread safe, even in the face of a memory model that allows compilers to modify their code. Thus it can be useful to explicitly say in Python's documentation what is the current state.

A rightful concern that these Python users might have is whether the assumptions they can make now will keep holding true in the future. I understand that making guarantees about the future of Python's memory model, or lack thereof, is very difficult. On the other hand, I think this uncertainty should be spelled out, and possibly make it clear that there are unknowns about the future.

I hope that the example from Java that I mentioned in the post can help clarify what I mean.

dpdani avatar Nov 05 '25 19:11 dpdani

Thanks for the discussion everyone! After talking to a lot of people and meeting with the Docs WG, here’s our revised approach focusing on CPython documentation rather than moving py-free-threading.github.io.

Target State

By the end of this effort:

  • Built-in types: Thread safety guarantees clearly documented
  • C API: All APIs annotated for thread safety
  • Standard library: Thread safety notes on modules where relevant
  • User documentation: HOWTOs and tutorials for both Python users and extension authors

Action Plan

Near-term:

  • Write terminology/concepts page (likely in py-free-threading initially)
  • Establish documentation principles (what to document, what to avoid, probably also on py-free-threading)
  • :white_check_mark: Update free-threading HOWTOs
  • :arrows_counterclockwise: Update glossary (in review)
  • :arrows_counterclockwise: Review all GIL mentions in docs (in progress)
  • Determine thread safety guarantees for built-in types
  • Update Built-in types and/or data model page with the guarantees
  • Update library FAQ on atomicity guarantees
  • Update C API initialization docs with thread state notes

Medium-term

  • Review this plan after the near-term actions above have been completed, and write a more detailed proposal on how the C API reference docs and the stdlib modules documentation will be updated. Get consensus on that plan from the relevant stakeholders.

Longer-term:

  • Survey C API for thread safety concerns and start annotating all APIs
  • Add thread safety documentation to stdlib modules where it’s relevant

As we work on CPython docs, we’ll update py-free-threading.github.io with content better suited there. Moving tutorial/HOWTO content from that site can be considered once these documentation efforts are further along.

We’ve also thought about hosting monthly free-threading docs calls where people can join in and express concerns. We’ll announce that in the Docs section of DPO once we get all the details down. I’m also planning to notify the SC about all this if people are okay with the plan.

Feedback welcome!

lysnikolaou avatar Nov 13 '25 17:11 lysnikolaou

I tried to review the documentation across a few projects from https://hugovk.github.io/top-pypi-packages/ both that I contributed to and that I believe are more-or-less representative of good practices for the scientific ecosystem.

Thread safety docs that is not relevant to Python usage/not present

First, a number of project have nothing (or close to nothing) on multithreading/free-threading despite being compiled, which I think can be ok, for example Charset Normaliser/PyYaml.

For a few project I don't think the documentation is relevant for usage via the Python API, or similar (CFFI, Cryptography), note that for Cryptography any docs on thread safety is really about the rust definition, as it is written in rust. And same for part of pyarrow/arrow where thread safety is mostly on C++ docs.

Docs useful from the Python side

Most of those are a mix and match of various pages, plain text mention of thread (atomicity, and co sometimes), without proper structure, rarely section header, and many time simple reference to this repository. Often most of the mentions are in what's new/changelog, and most of the time because they list PRs title.

I will note also that even the spelling of multi-thread/multithread is inconsistent in projects, I don't know if/how this affects search, and I'm inclined to suggest preferring a dash, as at least sphinx search will (in the searchbar) consider multi-threaded as 2 tokens and thus it will match thread-safety, while multithreaded might not.

Few documents/docstrings have a dedicated threading section/header/admonition

List of some high level documents for a few libraries.

Most library have some limited mentions of threading in high level documents, those are rarely easy to find, and often a bout performance or controlling the number of underlying threads than safety.

  • https://numpy.org/doc/stable/reference/thread_safety.html, links back to here.

  • https://numpy.org/doc/stable/reference/c-api/array.html (mention a few thread a couple of time, in plain text

  • https://numpy.org/doc/stable/user/misc.html plain text

  • https://numpy.org/doc/stable/reference/random/multithreading.html

  • https://numpy.org/doc/stable/dev/internals.code-explanations.html plain text.

  • https://numpy.org/doc/stable/reference/generated/numpy.errstate.html (in a .. whatsnew::)

  • https://pandas.pydata.org/docs/dev/user_guide/gotchas.html ("thread safety" section, warning about .copy() + link to StackOverflow, "please use a lock", and the .copy() docstring as a plain text mention of thread safety)

  • https://pandas.pydata.org/docs/dev/user_guide/io.html (HDFStore not thread-safe, but no mention in HDFStore's docstring).

  • https://pandas.pydata.org/docs/dev/user_guide/enhancingperf.html some info + defers to numba docs

  • https://arrow.apache.org/docs/python/csv.html

  • https://docs.scipy.org/doc/scipy/tutorial/thread_safety.html (mentions some modules are not thread safe but not in each individual module documentation)

  • https://docs.scipy.org/doc/scipy/tutorial/parallel_execution.html (BLAS/LAPACK are already multithreaded)

  • https://scikit-learn.org/stable/computing/parallelism.html by far the most comprehensive as it touches a bit on joblib, open-mp and that numpy/scipy is multithreaded via blas/lapack

  • https://scikit-learn.org/stable/faq.html (why crash, or use more thread than requested)

Module/Function/class Docstring

I don't think a list is worth it, but almost never documented, sometime mentioned, almost always just as plain text in the middle of a paragraph, sometime in a .. whatsnew::, occasionally in .. warning, often defer to other documents about threading behavior or external resources.

Reflexions

This is of course likely incomplete, as it's hard to do an exhaustive review in particular with the various search terms. Much of the current documentation focus on threading with the gil, so there is not much for free-threading (yet). The documentation for the ecosystem are also a bit different than for CPython, seem to be more often focused on performance and often just a passing mention of not-thread safe without further elaboration.

What I'd like to suggest is/are:

  • Suggested name for top-level section in documentation about threading and in particular free-threaded python.
  • Maybe provide a package with pre-made directive that turns into admonitions (.. freethreaded::, .. atomic::, .. non-atomic, .... which can/could automatically link to the relevant glossary and are easy to search for plus would provide standard css classname so that popular themes can style them specifically, which should make it as easy as possible for maintainers and remove as much variation across projects as possible.

Carreau avatar Nov 24 '25 19:11 Carreau