concurrent-ruby icon indicating copy to clipboard operation
concurrent-ruby copied to clipboard

Concurrent::Hash and Concurrent::Array are not fully threadsafe on CRuby

Open jacobat opened this issue 3 years ago • 9 comments

Concurrent::Hash and Concurrent::Array are not fully threadsafe on CRuby.

This can be demonstrated for Array with:

require 'concurrent/array'

array = Concurrent::Array.new
array << 0

20.times.map do |i|
  Thread.new { array.map!{|v| sleep 0.001; v + 1} }
end.each(&:join)
p array

This returns [1].

For Hash this code shows the issue:

require 'concurrent/hash'

hash = Concurrent::Hash.new
hash[:a] = 0

20.times.map do |i|
  Thread.new { hash.transform_values!{|v| sleep 0.001; v + 1} }
end.each(&:join)
p hash

This returns {a: 1}.

In both cases we would expect to see 20 instead of 1.

* Ruby implementation:             Ruby
* `concurrent-ruby` version:       1.1.9
* `concurrent-ruby-ext` installed: no
* `concurrent-ruby-edge` used:     no

jacobat avatar Dec 14 '21 16:12 jacobat

Hmmm I can see the misunderstanding. The docs say "ensuring only one thread can be reading or writing at a time" and that's what's going on here - one thread reads at a time, and one thread writes at a time, but not both in a single atomic action. The behaviour you're expecting is implemented on the other VMs, but that's really because they over-compensate and lock everything, which also isn't very healthy. I'd welcome discussion.

chrisseaton avatar Dec 14 '21 23:12 chrisseaton

The docs say: "A thread-safe subclass of Hash/Array". I guess the question is: what is meant by thread safe? I tried googling and wasn't able to find a crystal clear definition.

One level of "thread safety" is that the object remains internally consistent while being used concurrently. As an example where this is not the case one can try manipulating a regular Ruby array from multiple threads using JRuby. CRubys Array and Hash classes maintain internal consistency at all times. IMHO this level is necessary but insufficient to be thread safe.

A stronger form of thread safety is when "all methods contain sufficient internal synchronization that instances may be used concurrently without the need for external synchronization"1.

Wikipedia2 states that thread safe means that the "Implementation is guaranteed to be free of race conditions when accessed by multiple threads simultaneously".

I hope we can agree that the Hash and Array classes provided fails to meet both of these criteria for any method that does both reading and writing on CRuby.

I think the part about "need for external synchronization" in the above quote is particularly interesting. As of now there's no way to use Array#map! safely in a multithreaded environment without external synchronization. You would have to wrap every call to map! in synchronization for it to be safe. Which to me would seem like an indication for something being not thread safe.

jacobat avatar Dec 15 '21 09:12 jacobat

If you want these methods to be atomic right now as a workaround, a solution is to use compare-and-swap via Atom or AtomicFixnum.

chrisseaton avatar Dec 15 '21 10:12 chrisseaton

This is more than I expect from 'thread safe' also and I think the CRuby behavior is fine.

The problem I see is that the library is inconsistent. On both JRuby and TruffleRuby the thread execution is serialized and the block is atomic.

I think having a synchronised map! / each function would be a nice thing for CRuby to have too.

bestie avatar Dec 15 '21 12:12 bestie

I don't have an application depending on this. I was simply trying things and was surprised by the behaviour. It just seems like something that could catch people by surprise if they're unaware of it.

I don't see how I could make an implementation using #compare_and_set with #map! that returns the correct result consistently.

jacobat avatar Dec 15 '21 12:12 jacobat

I don't see how I could make an implementation using #compare_and_set with #map! that returns the correct result consistently.

I see it now :)

jacobat avatar Dec 15 '21 13:12 jacobat

Fixing this would mean adding overhead for Concurrent::Array on CRuby. Not ideal, but it might be the right thing to do for semantics, unsure. Holding a lock during a block like the one given to map! can also lead to deadlocks (#627), but that's an existing issue and is probably very hard/impossible to solve on non-CRuby, so maybe a lesser concern here.

Currently only that first level of https://github.com/ruby-concurrency/concurrent-ruby/issues/929#issuecomment-994581792 thread-safety is guaranteed. I.e., internal consistency, but it does not make Ruby blocks or operations accessing multiple indices atomic. Only individual read and writes are atomic.

eregon avatar Dec 12 '22 12:12 eregon

If I understand this issue correctly (and I think that I do), I want to point out that the documentation on Array explicitly states that iteration operations like #each locks the object itself ensuring only one thread can be reading or writing at a time for the duration of the method call. (Paraphrasing.) The actual behavior seems to conflict with the documentation. See the actual text here: https://github.com/ruby-concurrency/concurrent-ruby/blob/master/lib/concurrent-ruby/concurrent/array.rb#L10

And, for the record, I appear to be running into this issue in my professional development.

eviljoel avatar Dec 22 '22 19:12 eviljoel

@eviljoel Right, this is indeed inconsistent. The documentation is the intended behavior but is indeed not the case on CRuby. I plan to fix this issue and so make it match the intended behavior.

eregon avatar Mar 10 '23 12:03 eregon