ocaml icon indicating copy to clipboard operation
ocaml copied to clipboard

Thread-local storage, take 3

Open gasche opened this issue 1 year ago • 7 comments

This PR is the third PR to propose thread-local storage, after

  • #12719, written by @c-cube and inspired by @polytypic
  • #12724, a second attempt by @c-cube, later adopted by @kayceesrk

#12719 introduced TLS as a separate module from DLS, with its own implementation. I asked whether we should have TLS separate from DLS, there was a debate, with a consensus moving in the direction of replacing DLS with TLS wholesale.

#12724 replaced DLS with TLS, with mostly the existing implementation unchanged, but extra logic to save/restore TLS state on thread context switch. There was a lot of API discussion and wide-reaching name changes.

In the meantime the DLS implementation has changed quite a bit, due to the move to make it thread-safe (but not thread-local) in #12889. We also got more data on third-party usage of DLS ( see for example my inspection of Domain.at_exit use-cases in https://github.com/ocaml/ocaml/pull/12724#issuecomment-2258549438 ).

I have decided to revisit the decision of replacing DLS by TLS. Previously I argued that having just one mechanism (1) is conceptually simpler for users, and (2) means less implementation works for OCaml maintainers; so we should optimize for the majority use-case (threads) and sacrifice the other (domains). In the present PR I try an alternative to alleviate both concerns, which is to expose both mechanisms but with the same interface and the same implementation. I believe that this can provide conceptual consistency, at a manageable cost in implementation complexity.

The present PR is in a functional state, its history is clean, it is ready for review and could be integrated quickly if we like the idea.

What is in this PR

  1. Runtime support for TLS storage, alongside the existing support for DLS storage (most of the code is reused/adapted from the previous PRs).

  2. A new internal stdlib module, camlinternalStorage, that exports both TLS and DLS modules with, basically, the same interface that Domain.DLS has today. Internally this is defined using a functor over the state-access primitives (which differ for DLS and TLS), which is instantiated twice. I massaged the code to ensure that the functor application does not hurt the performance of DLS.get, even on non-flambda compilers.

  3. A new user-exposed stdlib module Thread_local_storage that exports TLS, reused from the previous PRs. DLS remains available through Domain.DLS.

  4. Some tweaks to the "storage" interface that come from review discussions in the previous PRs, in particular the type 'a t rather than 'a key and the use of make instead of new_key. The older names remain available in DLS for compatibility, but are not exposed in TLS.

  5. Simple unit tests for TLS, with and without the threads runtime linked. (But note that TLS is not used in the standard library for now, so we get substantially less test coverage that we had in #12724.)

What is not in this PR

  1. DLS.get relies on a compiler primitive supported by the compiler backend. I have not implemented this for TLS.get, which is just a call to a C function. This is good enough for a first iteration of the PR; eventually we want a primitive for performance, but it makes the change more invasive and can come in a later PR.

  2. I believe that a satisfying solution for TLS should expose a form of Thread.at_exit function, so that we can release thread-local resources. No such function is currently implemented, so the picture is incomplete. Implementing Thread.at_exit in a way that works well with and without the threads library linked is not easy, and better left for another PR. I think that my separate PR #13348 would help moving in this direction (sketch: define CamlinternalThread that implements at_exit differently whether or not the threads machinery has been initialized, and expose at_exit from Thread_local_storage directly.)

  3. #12724 (necessarily) includes a move from DLS to TLS in the OCaml standard library, in particular for Format buffers and the global Random state. No such change is included in the present PR, in part to keep it smaller and in part due to the lack of Thread.at_exit (see point (2) above).

  4. I remain unsatisfied with the new API. I think that "foo-local storage key" should maybe be called a "foo-local variable" or a "foo-local reference", some conceptual change is missing to make the API more consistent with OCaml users -- but maybe we want to expose or at least document both views to remain familiar for domain experts.

    I propose to leave this outside the scope of the current PR, and instead open a separate discussion #13356 to hash out API proposals. Otherwise this has a good chance to monopolize discussions and prevent discussing the broader picture (one local storage module, two?).

    If we converge on a satisfying API proposal quickly, I can update the PR to reflect it (I already tried to incorporate the consensus from previous PRs.) Otherwise an interface change can perfectly be done in a later PR. The functorized design makes it easy to evolve DLS and TLS in a consistent way.

What to look at, review and discuss

I believe that the main question is whether we like the two-modules solution, and whether we want to go for it in time for OCaml 5.3 (feature freeze date rumored to be mid-August, so that would require swift consensus-building).

If you want to look at the two user-exposed APIs, they are:

  • Domain.DLS: https://github.com/gasche/ocaml/blob/thread-local-storage/stdlib/domain.mli#L106-L174
  • Thread_local_storage: https://github.com/gasche/ocaml/blob/thread-local-storage/stdlib/thread_local_storage.mli

(but: I'll open a separate issue to discuss the details of the API design.)

If you want to look at implementation:

  • functorized code for local storage: https://github.com/gasche/ocaml/blob/thread-local-storage/stdlib/camlinternalStorage.ml
  • runtime code (there is not much), shared through static inline functions: https://github.com/gasche/ocaml/blob/thread-local-storage/runtime/domain.c#L2130-L2200

If you want to review the PR in full, I would suggest reviewing commit-by-commit.

gasche avatar Aug 02 '24 20:08 gasche

I opened a discussion thread at #13356 for the details of the local storage API. Please discuss these there, I can easily reflect whatever changes we agree on in the current PR.

gasche avatar Aug 02 '24 20:08 gasche

Today I pushed two commits that were arguably missing from the PR (there was a "things missing from this PR" section in the PR description, which is now removed):

  • basic unit tests for TLS, including tests where the thread runtime is not linked (then TLS behaves just like DLS)
  • a little optimization work to ensure that the functor application does not reduce the performance of DLS.get (the only performance-critical operation in my book) compared to the previous non-functorized definition

As far as I am concerned, the PR could be merged in its current state.

gasche avatar Aug 03 '24 15:08 gasche

We discussed this PR at the maintainer meeting today. There isn't a strong consensus in favor of offering both TLS and DLS in the long run, as is being proposed here (people remain doubtful that DLS has enough real-world use to justify central stdlib support), but there was a consensus on the idea of adding TLS and leaving DLS alone for now, especially if we can share code between them as proposed in this PR.

gasche avatar Oct 28 '24 21:10 gasche

Hi @gasche, this is amazing. Looking at it has been on my TODO list for a while.

I'd love to help move it forward if there's any way I can help.

From a user's perspective, it would be great to have a lower barrier to entry than working directly with this OCaml compiler branch.

For example, say I start experimenting with c-cube/thread-local-storage to adapt my code to the TLS API.

Would it then be possible to switch to your branch more or less transparently once my code is running well?

ejgallego avatar Jun 18 '25 18:06 ejgallego

The blocking factor (to me) appears to be to find a maintainer willing to review this change. The maintainers that are typically interested in multicore aspects of the runtime are @kayceesrk and @OlivierNicole. I suppose that @stedolan, @damiendoligez and @NickBarnes would also be fine reviewers but they are typically unavailable.

gasche avatar Jun 18 '25 19:06 gasche

3. A new user-exposed stdlib module Thread_local_storage that exports TLS, reused from the previous PRs.

Why not Thread.TLS ? I find the asymmetry with Domain.DLS rather jarring.

nojb avatar Jun 18 '25 19:06 nojb

Using a submodule of Thread forces to link the thread runtime. This may not be what library-authors want if they want to write code that is thread-safe, but without deciding for their users whether they will use threads or not. (This being said, one may argue that DLS should be used in this case, with TLS reserved for cases where we know for sure that threads are involved.)

gasche avatar Jun 18 '25 19:06 gasche