clasp icon indicating copy to clipboard operation
clasp copied to clipboard

Add Support for Atomic Operators (like CAS, etc.)

Open Shinmera opened this issue 6 years ago • 11 comments

It would be great if Clasp offered support for CAS and other atomic operators in order to allow the implementation of lock-free datastructures and algorithms.

A variety of implementations already offer support for operations like cas, atomic-incf, atomic-push, and atomic-pop. For instance, both ECL and SBCL follow the protocol described in the SBCL manual.

I've accumulated the information for atomic operator support in a new portability library called Atomics. For Clasp to be supported, it would need to offer CAS on at least svref, car, cdr, and structure accessor places. Naturally it would be great if it offered more than that still, but those are the most fundamental.

Shinmera avatar Jan 24 '19 09:01 Shinmera

What we would need to support atomics... LLVM has a cmpxchg instruction that can be used as long as we compute an address, so we don't have to worry about the machine too much (thankfully). C++ has atomics but I'm thinking working with these things through C++ is not the correct solution.

In the compiler we access car, cdr, and svref (through aref-instruction) by computing a memory address, so I think we should be able to do those without an enormous amount of trouble.

Structures (by which i mean, structures with no :type) may be more problematic. At present we do not keep track of what's a structure accessor - they're just defined as inline functions. The writers are done through define-setf-expander/defsetf, so if we had SBCL's cas places we could throw in a define-cas-expander as well.

But the real issue, and also the issue with (funcallable-)standard-instance-access, is that we have a double indirection for instance slots (including structure-object instances) - the object doesn't include the slots directly, but instead has a "rack" of them, in order to support swapping things out with change-class. I'm not sure what the semantics would be for simultaneous atomic operations and change-class, or how we would deal with it independent of that.

Bike avatar May 09 '19 16:05 Bike

For the purposes of Atomics I would be fine with documenting that using CAS in the presence of change-class leads to undefined behaviour.

Shinmera avatar May 10 '19 08:05 Shinmera

guess that works. then CAS on an instance is basically get the rack not-necessarily-atomically, then cas the spot in the rack.

Bike avatar May 10 '19 15:05 Bike

pkhuong in #sbcl reports that the semantics for CAS during change-class/class redefinition are the same as for other slot accesses, which is to say it may not propagate to the upgraded instance, so same as this.

Shinmera avatar May 10 '19 15:05 Shinmera

If there's any changes about this, please tag this issue in the relevant commit messages so that I can start work on adding it to Atomics.

Shinmera avatar May 11 '19 09:05 Shinmera

Planning:

For the interface, using sbcl's compare-and-swap system seems fine for cas. That's sufficient to implement atomic-whatever, but we might also want to implement read-modify-write operations more directly since we may be able to do them without a cas loop. If we do that, LLVM's atomicrmw instruction supports fixnum addition and subtraction, and/nand/or/xor, fixnum max and min, and floating point addition and subtraction. But to begin with cas should be fine, and we can define atomic-incf and stuff in terms of it.

LLVM has a target-specific size limit on what it can do atomically. It's probably just word length though.

SBCL supports car, cdr, aliases, svref, symbol-plist, symbol-value, slot-value, (funcallable-)standard-instance-access, and structure accessors as CASable. We should be able to do all of those... eventually. Cons and instance accessors are a good place to start probably.

LLVM also lets us use weak CAS and pick an ordering (acquire, release, bla bla) for our operations. That might be worth looking at much later, but for now the usual strong/sequentially consistent operations oughta be fine.

As for what we need to do:

First, when we generate loads or stores of memory that's shared between threads (e.g. data structures), we need to mark them as atomic. We don't do that, and according to llvm semantics that means they're all undefined. In practice it seems to work okay, but for the sake of conformance we really should do this. This doesn't add any new capabilities.

Second, compiler support for compare and exchange: llvm supports this with the cmpxchg instruction. It just works with memory addresses, and we already do that stuff. We pretty much just need to add some compiler infrastructure in bclasp and cleavir. Nothing super difficult.

Third, the runtime. This is trickier. Similar to point one, we should strictly speaking have all our thread-nonlocal structures more full of atomics, e.g. conses and vectors should have elements of std::atomic<x> type instead of just x. C++ doesn't, as far as I know, guarantee the object representation of atomics compared to their underlying types, so as soon as we combine this with the compiler work we run into undefined behavior very quickly, but that's like most of what we do so whatever. Anyway, we could define CAS functions in the C++ once we do this.

Two and three could both get us to supporting atomics but I'm gonna go with two probably, at least to start.

Bike avatar Dec 17 '19 18:12 Bike

Do note that atomic-incf/decf in Atomics differs from a mere CAS, since they perform modular arithmetic on some implementations and can be used on storage that's not restricted to element-type T.

Shinmera avatar Dec 17 '19 18:12 Shinmera

clasp doesn't have non-T slots in structs yet anyway. we'll see how things shake out...

Bike avatar Dec 17 '19 19:12 Bike

Arrays are another contender.

Shinmera avatar Dec 17 '19 19:12 Shinmera

unless there's a conceptual reason i'm missing i don't think we should have a problem CASing unboxed slots. Have to define it so that it's EQ for boxed and numerically equal for integers, etc., but that's fine.

Bike avatar Dec 17 '19 20:12 Bike

All right some more details on this. Clasp now has a CAS interface usable with mp:cas, and many structures can be accessed atomically safety (conses, instances, symbol stuff, but not arrays yet).

But I think for a really usable atomics library we need more. Especially, to define the semantics of atomic loads and stores in more detail. Up there I said sequential consistency would be fine but I'm rethinking that, as having all accesses be sequentially consistent may imply insertion of barriers to prevent reordering, etc. I know we have a lot of slowness to worry about in Clasp already, but avoiding a memory barrier on every car sounds good. Almost every access doesn't actually need atomicity, after all.

A while back I tried sketching out a language extension for atomics but it needs work. But for supporting things properly I think we need the following:

  • Primitives for accessing places atomically
    • ...with various atomic orders
  • Primitives for CAS (mostly done)
    • ...with various atomic orders
  • Primitives for atomic read-modify-write operations the processor can perform directly (e.g. modular addition, bitwise operations)
    • ...with various atomic orders
  • Usable interface for atomic accesses, like the atomic macro in that gist
  • Usable interface for CAS (mostly done)
  • A way to define and use atomic read-modify-write versions of Lisp non-atomic read-modify-write operators (e.g., atomic-incf for incf, atomic-push for push)
    • ...and have them use machine atomic RMW operations when available

This will take some work. One tricky bit is that closed over lexical variables should be accessible atomically, CASable, etc., as well (as closures can be shared between threads). This will probably require some fairly deep compiler support, like a casq special operator.

It would be good to have this sort of library because, perhaps unlike existing implementations, we would like to use it internally. CLOS is used extensively by the compiler, and we have options to use the compiler multithreadedly, so it should be thread-safe in a lightweight way. For example, in order to implement the thing where interpreted discriminating functions are compiled after a certain number of calls, we currently have an atomic field in the C++ definition of funcallable instance, even though not all funcallable instances are generic functions; it would be good to be able to implement even a simple atomic counter in Lisp, and ideally without using CAS.

Bike avatar May 18 '20 12:05 Bike