activegraph icon indicating copy to clipboard operation
activegraph copied to clipboard

Possible issue with thread safety?

Open mperice opened this issue 3 years ago • 2 comments

We use a sidekiq process with 3 threads in order to synchronize between postgresql and neo4j DB. Until migrating to v10 this worked flawlessly, but now, it seems that whenever we perform multiple concurrent 'merge' writes (to same table?) connection just somehow hangs indefinitely. If I run the same code with 3 sidekiq processes with a single thread, it works as expected.

I have tried to find a small code sample that would help to debug this issue. This is what I came up:

Code example (inline, gist, or repo)

This works:

Parallel.each((1..3000), in_processes: 50) do |_|
    puts JobOpening.find_or_create({psql_id: "fe1fb5be-69ca-47a9-813e-4496b6790035"}, {title: "Route Driver for Vending Company"})
end

While this hangs most of the times:

Parallel.each((1..3000), in_threads: 50) do |_|
    puts JobOpening.find_or_create({psql_id: "fe1fb5be-69ca-47a9-813e-4496b6790035"}, {title: "Route Driver for Vending Company"})
end

As a side note, I had no problem running both code samples using neo4jrb v9.6 and neo4j 3.5.19.

Runtime information:

Neo4j database version: 4.0.7 neo4j gem version: 10.0.1 neo4j-ruby-driver gem version: 1.7.0 seabolt: compiled from source, ubuntu 20.04

If you need more information feel free to ask. Love your work by the way!

mperice avatar Aug 06 '20 15:08 mperice

I ran into this as well on macOS. It appears to happen when trying to call Neo4j::Driver::DirectConnectionProvider#acquire_connection at the same time in muliple threads, since the threads are sharing the FFI seabolt connector object.

Despite the fact that It looks like active graph tries to put explicit_session and tx in thread-specific variables, ActiveGraphTransactions#send_transaction calls driver.session which gabs a memoized driver that is shared between threads. When the driver tries to build a session for each thread, it does so with the same session factory initiated in the shared driver, which provides the same connection_provider to each session.

I found that these two methods consistently hang, the second method represents the path ActiveGraphTransactions takes when trying to send a transaction if tx and explicit_session are both nil or not open (as they will be in a new thread)

def hang1
   threads = (1..10).map do |_n|
     Thread.new do
       (1..100).each do |_m|
         ActiveGraph::Base.read_transaction {}
       end
     end
   end
   threads.map(&:join)
 end

def hang2
   threads = (1..10).map do |_n|
     Thread.new do
       (1..100).each do |_m|
         ActiveGraph::Base.driver.session.send(:acquire_connection, Neo4j::Driver::AccessMode::READ)
       end
     end
   end
   threads.map(&:join)
end

efivash avatar Aug 06 '20 16:08 efivash

@efivash @mperice let's move the discussion to https://github.com/neo4jrb/neo4j-ruby-driver/pull/47

klobuczek avatar Aug 06 '20 20:08 klobuczek