concurrent-ruby
concurrent-ruby copied to clipboard
Concurrent::Hash and Concurrent::Array are not fully threadsafe on CRuby
Concurrent::Hash and Concurrent::Array are not fully threadsafe on CRuby.
This can be demonstrated for Array with:
require 'concurrent/array'
array = Concurrent::Array.new
array << 0
20.times.map do |i|
Thread.new { array.map!{|v| sleep 0.001; v + 1} }
end.each(&:join)
p array
This returns [1].
For Hash this code shows the issue:
require 'concurrent/hash'
hash = Concurrent::Hash.new
hash[:a] = 0
20.times.map do |i|
Thread.new { hash.transform_values!{|v| sleep 0.001; v + 1} }
end.each(&:join)
p hash
This returns {a: 1}.
In both cases we would expect to see 20 instead of 1.
* Ruby implementation: Ruby
* `concurrent-ruby` version: 1.1.9
* `concurrent-ruby-ext` installed: no
* `concurrent-ruby-edge` used: no
Hmmm I can see the misunderstanding. The docs say "ensuring only one thread can be reading or writing at a time" and that's what's going on here - one thread reads at a time, and one thread writes at a time, but not both in a single atomic action. The behaviour you're expecting is implemented on the other VMs, but that's really because they over-compensate and lock everything, which also isn't very healthy. I'd welcome discussion.
The docs say: "A thread-safe subclass of Hash/Array". I guess the question is: what is meant by thread safe? I tried googling and wasn't able to find a crystal clear definition.
One level of "thread safety" is that the object remains internally consistent while being used concurrently. As an example where this is not the case one can try manipulating a regular Ruby array from multiple threads using JRuby. CRubys Array and Hash classes maintain internal consistency at all times. IMHO this level is necessary but insufficient to be thread safe.
A stronger form of thread safety is when "all methods contain sufficient internal synchronization that instances may be used concurrently without the need for external synchronization"1.
Wikipedia2 states that thread safe means that the "Implementation is guaranteed to be free of race conditions when accessed by multiple threads simultaneously".
I hope we can agree that the Hash and Array classes provided fails to meet both of these criteria for any method that does both reading and writing on CRuby.
I think the part about "need for external synchronization" in the above quote is particularly interesting. As of now there's no way to use Array#map! safely in a multithreaded environment without external synchronization. You would have to wrap every call to map! in synchronization for it to be safe. Which to me would seem like an indication for something being not thread safe.
If you want these methods to be atomic right now as a workaround, a solution is to use compare-and-swap via Atom or AtomicFixnum.
This is more than I expect from 'thread safe' also and I think the CRuby behavior is fine.
The problem I see is that the library is inconsistent. On both JRuby and TruffleRuby the thread execution is serialized and the block is atomic.
I think having a synchronised map! / each function would be a nice thing for CRuby to have too.
I don't have an application depending on this. I was simply trying things and was surprised by the behaviour. It just seems like something that could catch people by surprise if they're unaware of it.
I don't see how I could make an implementation using #compare_and_set with #map! that returns the correct result consistently.
I don't see how I could make an implementation using #compare_and_set with #map! that returns the correct result consistently.
I see it now :)
Fixing this would mean adding overhead for Concurrent::Array on CRuby.
Not ideal, but it might be the right thing to do for semantics, unsure.
Holding a lock during a block like the one given to map! can also lead to deadlocks (#627), but that's an existing issue and is probably very hard/impossible to solve on non-CRuby, so maybe a lesser concern here.
Currently only that first level of https://github.com/ruby-concurrency/concurrent-ruby/issues/929#issuecomment-994581792 thread-safety is guaranteed. I.e., internal consistency, but it does not make Ruby blocks or operations accessing multiple indices atomic. Only individual read and writes are atomic.
If I understand this issue correctly (and I think that I do), I want to point out that the documentation on Array explicitly states that iteration operations like #each locks the object itself ensuring only one thread can be reading or writing at a time for the duration of the method call. (Paraphrasing.) The actual behavior seems to conflict with the documentation. See the actual text here: https://github.com/ruby-concurrency/concurrent-ruby/blob/master/lib/concurrent-ruby/concurrent/array.rb#L10
And, for the record, I appear to be running into this issue in my professional development.
@eviljoel Right, this is indeed inconsistent. The documentation is the intended behavior but is indeed not the case on CRuby. I plan to fix this issue and so make it match the intended behavior.