RFCs icon indicating copy to clipboard operation
RFCs copied to clipboard

Atomic record fields

Open gasche opened this issue 1 year ago • 13 comments

A proposed design for atomic record fields.

Rendered version.

In summary:

(* in the stdlib *)
module Atomic : sig
  ...
  module Field : sig
    type ('r, 'a) t
    val get : 'r -> ('r, 'a) t -> 'a
    val compare_and_set : 'r -> ('r, 'a) t -> 'a -> 'a -> bool
  end
end

(* user code example *)
type t = {
  id : int;
  mutable state : st [@atomic];
}

let rec evolve v =
  let cur = v.state in (* <- strong atomic read *)
  let next = next_state cur in
  if not Atomic.Field.compare_and_set v [%atomic.field state] cur next
  then evolve v

gasche avatar May 31 '24 14:05 gasche

@bclement-ocp has an alternate proposal, where we use an extension point to denote a (value, offset) pair, instead of just the offset. This would look like this -- adapting the previous example:

(* in library code *)
module Atomic : sig
  ...
  module Loc : sig
    type 'a t
    val get : 'a t -> 'a
    val compare_and_set : 'a t -> 'a -> 'a -> bool
  end
end

(* in user code *)
type t = {
  id : int;
  mutable state : st [@atomic];
}

let rec evolve v =
  let cur = v.state in (* <- strong atomic read *)
  let next = next_state cur in
  if not Atomic.Loc.compare_and_set [%atomic.field v.state] cur next
  then evolve v

Upsides of this proposal:

  • the phantom-typed value has a simpler type: 'a t instead of ('r, 'a) t
  • in particular, it is possible to expose a single set of primitive for all of atomic values, atomic record fields [%atomic.field v.state], and atomic arrays [%atomic.index a.(i)], [%atomic.unsafe_index a.(i)]

Downside: internally this is represented as a (value, offset) tuple, so it is inefficient if the tuple ends up being built, in particular for code of the form let loc = if ... then [%atomic.index foo] else .... On the other hand, direct applications to a primitive should be easy to compile to optimize the tuple construction away.

gasche avatar Jun 04 '24 15:06 gasche

I thought more about the suggestion of @bclement-ocp, and I see four different designs that are worth discussing.

Running example:

type t = {
  id : int;
  mutable state : st [@atomic];
}

The designs

1. A new two-parameter type ('r, 'v) Atomic.Field.t

With the current proposal, [%atomic.field state] has type (t, st) Atomic.Field.t. This type ('r, 'v) Field.t represents values of type 'v inside a larger structure or type 'r.

2. A new one-parameter type 'v Atomic.Loc.t

This was my initial understanding of the idea of @bclement-ocp: if foo has type t, then [%atomic.field foo.state] has type int Atomic.Loc.t.

This is an improvement over the two-parameter type because it is simpler. It avoids some subtleties coming from the first-class-field nature of the two-parameter proposal, for example any worries related to type-directed disambiguation is gone.

Internally 'v Atomic.Loc.t is represented as a pair of a value and an offset, something like Obj.t * int. The construction [%atomic.field foo.state] desugars to (Obj.obj foo, 1), but the idea is that when directly applied to a primitive Atomic.Loc.fetch_and_add (Obj.obj foo, 1) 42 the compiler will unpack the pair and avoid any allocation. One way to think of the optimization is that the code gets rewritten into Atomic.Field.fetch_and_add (Obj.obj foo) 1 42, where Field is now a secret module, not exposed to users, which includes two-arguments atomic primitives.

3. Full extension of 'v Atomic.t to include pointer+offset pairs

In fact we could get rid of this new type 'v Atomic.Loc.t and just extend 'v Atomic.t for this purpose under the hood:

module Atomic = struct
  type t =
    | Single of { mutable v : 'a }
    | Pair of Obj.t * int
end

The case Single corresponds to the memory representation of atomics today (the same as ref cells), and the new Pair constructor adds pointer+offset pairs to it. (This is not type-safe so this definition would be hidden/private.) With this design, we can give [%atomic.field foo.state] the type Atomic.t, and we do not need to introduce a second battery of atomic functions (fetch_and_add and stuff). The compiler can still optimize Atomic.fetch_and_add [%atomic.field foo.state] to avoid materializing the pair.

I was very happy with this proposal until I noticed a fatal flaw with it: if you do this, then any call on an atomic that is not statically known to be a Pair, in particular all atomics that are in use today, requires a dynamic check to tell if they are in the Single or in the Pair case. This is a big no-no. (In practice the test would be cheap, but people who use atomics want precise control over the code, not dynamic tests added for convenience.)

4. Static extension of 'v Atomic.t

A variant of proposal (3) is to do as if the type Atomic.t was extended with a Pair constructor, but statically reject all programs where Pair would have to be materialized / cannot be optimized away by the compiler. So either [%atomic.field foo.state] gets passed to a compiler-optimized Atomic.foo primitive, gets rewritten (morally) into a two-argument Atomic.Field.foo primitive, or the compiler rejects the program.

If you do this, then you know that all Atomic.t values reaching the runtime will in fact use the Single constructor (just like today), and you do not need to add new dynamic checks.

In other words, with this proposal [%atomic.field foo.state] pretends to be of type int Atomic.t, but this is a magic Atomic.t value that will never exist at runtime: it must be passed to an Atomic.foo primitive directly, which will separate the pointer from the offset.

My current thinking

I think that (3, full extension) is not acceptable for our users in practice. I prefer (2, new one-parameter type) and (4, static extension) to (1, new two-parameter type).

I find it a bit hard to compare (2, new one-parameter type) and (4, static extension).

  • The new one-parameter type proposal (2) leads to more duplication in the API, for reasons that will not be apparent to our users. It relies less on magic: you want to reason about whether the pair gets optimized away when writing code, but the code works as expected (and is in fact decently fast) in any case. Being able to materialize the pointer+offset pair can also be convenient for some use-cases, it is a desirable feature to offer to our users.

  • The proposal (4, static extension) is much less invasive in terms of API surface, we add [%atomic.field foo.state] and also {,unsafe_}index : 'a array -> int -> 'a Atomic.t primitives and that's it. I like the minimality.

    I think that we can do a good job of clear error messages in the cases where the value+offset pair cannot be optimized away, so I don't think that it will feel hard to use in practice. On the other hand, it remains more complex (more things to explain), and we lose the expressivity of being able to dynamically create pointer+offset pairs and pass them around conveniently.

gasche avatar Jun 13 '24 09:06 gasche

Even if you do 2 you might as well define the types like:

type ('r, 'v) field
type 'v loc = Loc : 'r * ('r, 'v) field -> 'v loc

Then people can still get 1 if they like.

lpw25 avatar Jun 14 '24 19:06 lpw25

I like how (2) fits simply into the compiler, using only things that already exist (extension points, types, compiler primitives). In comparison, (4) seems to add a lot of complexity for a gain that is not so big to me, namely no API duplication. I think we can easily explain to our users that Atomic.t is the simple path that will suffice for many applications, and atomic record fields and atomic accesses in arrays are slightly more advanced and are operated by a distinct set of functions in order to fit nicely into the language.

OlivierNicole avatar Jul 05 '24 12:07 OlivierNicole

Thanks, this is useful feedback. I agree that (2) is simpler. It's also probably not too hard to implement (4) on top of (2) later if we decide to do it, so in a sense (2) is a minimum viable proposal.

gasche avatar Jul 05 '24 12:07 gasche

For the record, I disagree with the memory-model label on this RFC, as the proposal is inside the bounds of the current memory model, as long as we make sure that atomic locations remain separate from non-atomic ones.

Speaking of this, I like the fact that atomic arrays can use the same access functions as atomic record fields; but to satisfy the separation of atomic and non-atomic locations, we need to allow atomic locations to be created only from a 'a Atomic.Array.t values, and not from regular arrays. So, as much as it pains me, it seems that there will be a bit of API duplication between Array and Atomic.Array.

Edit.: I just realized that it’s already the case in your proposal.

OlivierNicole avatar Jul 05 '24 15:07 OlivierNicole

For the record, I disagree with the memory-model label on this RFC, as the proposal is inside the bounds of the current memory model, as long as we make sure that atomic locations remain separate from non-atomic ones.

In my mind, anything to do with atomics, relaxed memory semantics and their compilation fall under the broad category of memory model. Recall that atomics in OCaml were introduced in the PLDI 2017 “memory model” paper. I’m using the tag only to find memory model related RFCs easily.

kayceesrk avatar Jul 05 '24 17:07 kayceesrk

There is now an implementation of this RFC -- the [%atomic.loc foo.bar] design evolved from the discussion, not the original description -- as a PR by @clef-men and myself: https://github.com/ocaml/ocaml/pull/13404 .

gasche avatar Aug 28 '24 22:08 gasche

I was asked to comment about whether there might be some reason to prefer the more complex initial design with two type parameters representing an offset into a block of some type and the design with a single type parameter that logically holds a pair of a block and offset into that block.

My initial thought was that there probably aren't that many applications for the added generality of having first-class offsets to atomic fields. However, I then went running last night and realized there might actually be some practically important use cases for the more complex design.

Consider the following FGL sketch:

module FGL : sig
  type state [@@immediate]
  val init : unit -> state
  val lock : 'block -> ('block, state) Atomic.Loc.t -> unit
  val unlock : 'block -> ('block, state) Atomic.Loc.t -> unit
end = struct
  type state = int
  (* ... *)
end

The idea is that a FGL takes only one word and can e.g. be embedded into the nodes of a data structure without allocations. This allows efficient Fine-Grained Locking approaches.

Only a couple of bits are needed to represent the state of an efficient lock and the queue of awaiters can be stored externally just like with a futex.

The init operation generates random bits and masks out the couple of bits used for the lock state. The random bits are then used as the hash value and the (identity of the) block value is used as the key for identifying the queue of awaiters corresponding to a specific FGL.

Addition: You actually want to use both the block address and the field offset to identify the FGL. This way you will be able to have multiple FGLs per block. You might also use bits from the offset as part of the hash value.

For reference:

  • https://ocaml-multicore.github.io/picos/0.6.0/picos_std/Picos_std_awaitable/index.html#mutex
  • https://webkit.org/blog/6161/locking-in-webkit/
  • https://www.rkoucha.fr/tech_corner/the_futex.html

You can, of course, implement something similar with the other design:

module FGL : sig
  type state [@@immediate]
  val init : unit -> state
  val lock : 'unique_key -> state Atomic.Loc.t -> unit
  val unlock : 'unique_key -> state Atomic.Loc.t -> unit
end = struct
  type state = int
  (* ... *)
end

One issue here is now that the lock and unlock operations need to be inlined, IIUC, to avoid an allocation. Also, there is no longer a connection between the 'unique_key and the atomic state.

polytypic avatar Nov 05 '24 09:11 polytypic

Thanks @polytypic. Your reply got me thinking about how we can move from one design to the other. (The value-and-offset design and the offset-only design.)

From the offset-only design, it is possible to implement the field-and-offset design with a GADT, as pointed by @lpw25 at https://github.com/ocaml/RFCs/pull/39#issuecomment-2168612623 .

From the value-and-offset design, it is possible to recover the field, but not its typing information: val obj : 'a Atomic.Loc.t -> Obj.t (which just takes the first element of the pair). This would be enough for your FGL use-case but it is not very nice.

The pair that is giving me pause about the offset-only design is the fact that the type-inference code is more complex, as we have to deal with type-based disambiguation, and probably require a type annotation from the user. It is easy to type-check [%atomic.loc r.f], this is just the usual type-checking code for record field access, but harder to type-checker [%atomic.field f] in isolation, in absence of expected type. We would need to experiment with this to see whether it works out okay in practice (without requiring a type annotation). My hunch is that idiomatic applications like Atomc.Loc.fetch_and_add r [%atomic.loc count] 1 would work okay in non-principal mode (the default), but emit a warning in principal mode, due to the fact that the propagation of type information during application is not considered principal. I guess this would behave similarly to the following:

# type t = { x : int };;
# type u = { x : int };;
# let (vt : t), (vu : u) = { x = 0 }, { x = 1 };;
val vt : t = {x = 0}
val vu : u = {x = 1}
# let access r f = f r;;
val access : 'a -> ('a -> 'b) -> 'b = <fun>
# access vt (fun r -> r.x), access vu (fun r -> r.x);;
- : int * int = (0, 1)
# #principal true;;
# access vt (fun r -> r.x), access vu (fun r -> r.x);;
Warning 18 [not-principal]: this type-based field disambiguation is not principal.

- : int * int = (0, 1)

gasche avatar Nov 05 '24 10:11 gasche

probably require a type annotation from the user

I wouldn't expect this to be a practically significant imbediment. I believe it is likely that the vast majority of uses of atomic record fields will be hidden inside data structure implementations written by people who are relatively experts in these matters. The FGL example, which takes an offset as a parameter, also falls into this category. The main use case for FGL would be writing data structures with fine-grained locking, which requires some expertise, and would usually be hidden behind the data structure abstraction.

polytypic avatar Nov 06 '24 11:11 polytypic

Today I thought more about how we would go for design one (('r, 'v) Atomic.Field.t) and how it relates to design two ('v Atomic.Loc.t). As Leo pointed out, if you have atomic fields, then you can define atomic locations as:

type 'v loc = Loc : 'r * ('r, 'v) Field.t -> 'v loc

There is no point in exposing atomic operations on 'v loc in addition to operations on ('r, 'v) Field.t, but this remains useful for expert users if they want to carry a 'v loc around as a first-class location, say in a data structure.

There are two upsides to having atomic operations take a 'r and a ('r, 'v) FIeld.t separately:

  • this is transparent from a performance perspective, it does not rely on the compiler optimizing the tuple away, so it is probably more robust/modular
  • it is slightly more expressive, with a reasonable example proposed by @polytypic above

I see two clear downsides to ('r, 'v) Field.t (besides "it's more work because we haven't implemented it yet"):

  • As pointed out earlier, it is harder to type-checker [%atomic.field x] than [%atomic.loc r.x] in principal mode, we don't have the type of r at hand to guide disambiguation.
  • As @clef-men remarked, when 'r is an inline record type (the record in Foo of { ... }), then the record type cannot be named in the surface syntax, so users have no way to annotate the field access to disambiguate the field name. This is a serious limitation because inline record types are common in @polytypic programs and would benefit from atomic fields.

I considered three ways out of this quagmire:

  1. We could expose both ('r, 'v) Field.t (as a primiritive type) and 'v Loc.t (defined as a GADT, without atomic operations on it), and then have both [%atomic.field x] and [%atomic.loc r.v], the latter can be used even with inline records.
  2. We could expose surface syntax for inline record types, to make the second downside go away. (This is something that Alain originally proposed when he introduced inline records, say t.Foo for the type of the record argument of the constructor Foo at type t).
  3. We could expose only 'v Loc.t as a primitive type, with a low-level function repr : 'v Loc.t -> Obj.t * int, which is ugly but happens to suffice to implement FGL, if I understand correctly. (This remains less expressive than ('r, 'v) Field.t, because it does not let us safely build an index for a record and then reuse it for another element of the same type.)

Note: If we decide to go for a design with 'v Loc.t as a primitive, we can always revisit this later and expose ('r, 'v) Field.t and re-define 'v Loc.t as the GADT on top of it. But one aspect of the design that I liked was to avoid exposing too many versions of atomic operations, and that approach loses on this front, as one would then end up with three copies of each atomic operation, Atomic.foo, Atomic.Loc.foo and Atomic.Field.foo. (And then possibly Atomic.Array.{unsafe_,}foo, urgh.)

gasche avatar Nov 07 '24 21:11 gasche

In https://github.com/ocaml/ocaml/pull/13707 I implement %atomic.field on top of the current PR (https://github.com/ocaml/ocaml/pull/13404) implementing %atomic.loc; the type-based disambiguation code is more involved, as expected, but more importantly the interaction with inline records is much worse than I thought it would be, basically the feature is unusable on inline records -- they cannot have atomic fields.

Let me quote my preliminary conclusions at the end of the description of that PR:

I see three reasonable options going forward:

  1. Give up on %atomic.field and merge https://github.com/ocaml/ocaml/pull/13404 which implements the simpler %atomic.loc, has been morally approved and carefully reviewed.

  2. Merge something like the current state of the present PR, with a fully working %atomic.loc and additionally a version of %atomic.field that has advantages but does not work with inline records. Later we can consider changing the type-checking of inline records to lift this limitation, and we would end up with two features that work well. (And some duplication as each atomic primitive foo would be available as Atomic.Field.foo and also Atomic.Loc.foo, but there are 5 of them so it's okay.)

  3. Decide that we absolutely want %atomic.field and not %atomic.loc, and that we must first change the type-checking of inline records to lift the current limitations of %atomic.field.

gasche avatar Jan 05 '25 16:01 gasche

@clef-men has a PR that implements atomic arrays at https://github.com/ocaml/ocaml/pull/14380.

Above we had a decision on the usefulness of exposing both bound-checked and non-bound-checked (unsafe) atomic access operations on arrays. The PR implements this, so it exposes yet another set of atomic primitives instead of building on Atomic.Loc directly (the implementation reuses the Atomic.Loc, but the interface is different submodule Atomic.Array).

gasche avatar Nov 25 '25 20:11 gasche