Thread-safety for union types
Since the introduction of the multi-threading preview (#8112), we have been aware that the representation of union types is not thread-safe.
Quoting from the introductory blog post:
In Crystal, unions between value-types and reference-types are represented as a tuple of a type id and the value itself. A union of
Int32 | AClassis guaranteed to not have anilvalue. But due to interleaving, the representation of anilcan appear, and a null pointer exception (a segfault in this case) will happen. [...] For now you will need to be aware that shared unions that can appear in class variables, instance variables, constants or closured variables are not safe (but will be).
The following example program demonstrates the issue. Parallel access from multiple threads can result in the class variable's type id being set by the first thread (to Foo), but the value is set by the second thread (the integer 0). Reading this combination results in a null pointer in a reference type that is not nilable.
Hence the program consistently runs into invalid memory access (within a couple hundred iterations or so). Note: It doesn't matter if the code is compiled with -Dpreview_mt or without. It uses the non-public Thread API for explicitness.
class Foo
class_property foo : Int64 | Foo = 0
INSTANCE = Foo.new
@value = 123
end
Thread.new do
loop do
Foo.foo = Foo::INSTANCE
end
end
Thread.new do
loop do
Foo.foo = 0
end
end
loop do
p Foo.foo # Segfault: Invalid memory access (signal 11) at address 0x4
end
It should be clear that crashing the process is unaceptable. The runtime must ensure sound behaviour of union types in a multi-threaded environment.
This topic as also been mentioned in MT schedulers impact (forum.crystal-lang.org). There it has been suggested that doing an atomic compare-and-swap could work for some unions, but not all (especially not for larger structs, @RX14). Another suggested approach is a spin-lock utilizing unused bits of the type id (@funny-falcon).
It should be clear that crashing the process is unaceptable. The runtime must ensure sound behaviour of union types in a multi-threaded environment.
Why is it unacceptable? This is a case of shared memory access and I'd not expect that to be multi thread safe by default, as the performance overhead would be insane. If someone wants that, why is not not reasonable that they handle that themselves?
Also note that this is not in any way unique to union types - this is just as much a problem for big structs that is not part of a union type.
That said - it can perhaps be reasonable to make certain kinds of variables thread safe by default. Not in any way limited to union types. Class properties in particular seems particularly suited to be protected against multi threaded usage due to their global nature.
This is issue is particularly relevant because it's about corrupted metadata in the type system. The type id indicating a different type than the data it describes is at a different level of severity than "just" inconsistent data.
Crystal's type system has a strong promise of safety, and this should hold up under all conditions.
A common idiom in Crystal is assigning the value of an ivar or the result of a method call to a local variable so that the type system can reason about it without outside influences (for example if foo = @foo.as?(Bar) - now foo is restricted to Bar). The very reason this exists is to ensure sound typing in a multi-threaded environment. One requirement for this to work is that reading the instance variable is a sound operation (in terms of the type system). This must be guaranteed for all types.
If we could not uphold this, we'd need to start questioning the understaning of type safety in Crystal.
This is a case of shared memory access and I'd not expect that to be multi thread safe by default, as the performance overhead would be insane. If someone wants that, why is not not reasonable that they handle that themselves?
Safety comes first. Crystal usually tries to make the default behaviour as safe as possible to avoid unexpected footguns. More performant alternatives with less safety gurantees are usually opt-in, not opt-out.
@yxhuvud we're not talking about "obvious" cases like mutating an Array, I'd remember to protect accesses (usually) but updating one value? That's usually thread safe... until it's not because Crystal decided to create an union? Nasty. That could go unnoticed and lead to modify whatever in a program :fearful:
Performance hit should be taken with a grain of salt (and measured). Globals or even closured data may not be mutated that much that the impact would lead to terrible performance. Static analysis could detect safe cases and skip some safety checks. Heck, look at this article the impact of always checking a generational number every time it dereferences a pointer was only 10% (before static analysis and manual opt-in to skip checks) :astonished:
we're not talking about "obvious" cases like mutating an Array, I'd remember to protect accesses (usually) but updating one value? That's usually thread safe... until it's not because Crystal decided to create an union? Nasty. That could go unnoticed and lead to modify whatever in a program
The "usually" part is hard to justify when any sufficiently large aggregate will have the same issue; after all, an Int64 | Foo is really just an {Int32, Int64[1]} at the ABI level. IMO both circumstances are equally obvious here; restricting thread-level parallelism to spawn does not magically relieve the programmer of the responsibility to take care of concurrent accesses, whether they manifest via the type system or not.
That said, MT safety looks justifiable for every use of class_getter / class_property in the standard library, most of them due to #14905. This also goes for types that fit into a single register and aren't polymorphic, like Log.progname : String. So I wouldn't mind if class_* is MT-safe by default, as long as the same concurrent control isn't baked into the ABI level, i.e. this will still require manual synchronization by the programmer:
class Foo
@@foo : Int64 | Foo = 0
INSTANCE = Foo.new
@value = 123
Thread.new do
loop do
@@foo = INSTANCE
end
end
Thread.new do
loop do
@@foo = 0
end
end
loop do
p @@foo # this is expected to break
end
end
the intent being that if a faster alternative is available, such as grouping multiple accesses into a critical section, then the programmer should be able to express precisely that without the hidden abstraction costs.
We do lots of expensive bound checks and lots more checks everywhere to avoid undefined behavior as C is plagued with, but union types, which are core to the crystal language, and whose codegen is an internal implementation detail of the compiler, are left to all the developers to take care of 😞
Getters are one thing, but what about a closured union that becomes shared across threads?
We can explore ways to make it safe, ways to opt out of safety (then the developer becomes responsible for the safety) and explore ways to skip the checks, then we can decide how to proceed.
Again - this has nothing to do with union types - this is just as problematic when accessing big structs. If they can change while reading the value then you are out of luck.
Consider the original example but changing foo property to instead be an instance of type
struct Foo
property foo : Int64 | Foo = 0
end
Would that be type safe under the suggested scheme? I don't see how short of implementing a borrow checker.
Getters are one thing, but what about a closured union that becomes shared across threads?
The only way to reach sanity in the case of shared closures is to a: make sure it is properly initialized before accessing it from other threads and b: Don't change the value. If we can figure out a way to enforce properties like that then that could be a way to achieve safety.
edit: oh, and c: Make sure the value actually is there, in case it refer to a local variable. Unless there is something that make sure it doesn't go out of scope (like structured concurrency) there really isn't anything guaranteeing the originating scope hasn't returned.
This issue has been mentioned on Crystal Forum. There might be relevant details there:
https://forum.crystal-lang.org/t/charting-the-route-to-multi-threading-support/7320/1
Trying to sum up:
This is indeed just a regular data race issue (caused by parallelism): any struct with 2+ values will need 2+ loads to read, and 2+ stores to write, and if reading and writing happen at the same time in multiple threads, then a thread can read an invalid value.
What makes it more problematic for mixed type unions, is that it breaks the type system. It's not data corruption of user code, it's about the compiler happily choosing an invalid path based on an invalid read of the value type (:boom:).
Yet, I believe there are little reasons to fix mixed type unions only: if we can identify a shared union (class variable or closured local variable) and protect it with a rwlock for example, then we can identify shared variables in general and protect them all with a rwlock.
Technical idea:
A crystal type id is a 32-bit number, and because of padding, we should often have space for another 32-bit number on 64-bit systems because of padding. We could make the padding explicit for every union, so memory layout is the same everywhere, and we use this 32-bit word to implement a rwlock.
This would be enough for a basic rwlock (spinning with Thread.delay then Thread.yield). An efficient rwlock needs a list of waiting threads, which means we also need one pointer... unless we do something akin to futexes or parking lot: use the word's address to keep the rwlock in a global hash (at most as large as the total number of threads).
It might be possible to reserve a 32-bit space for class variables and closured variables?
Alternative or preliminary work to reduce the mandatory rwlocks:
We could implement a linter to detect potential shared accesses, so users can at the very least protect these cases.
We could make it smarter and smarter by proving the global is safe. For example: it only needs 1 load/store (number primitives or even a Reference or Pointer) or it's never assigned after initialization, or only ever accessed from the same fiber, or all accesses happen within a synchronization primitive (mutex, rwlock), ...