activegraph icon indicating copy to clipboard operation
activegraph copied to clipboard

Neo4j::Driver::Exceptions::SessionExpiredException when all sockets are closed

Open mstrofbass opened this issue 4 years ago • 4 comments

We recently set up a proxy (istio/envoy) between our Rails backend and the Neo4j database. After that switch, we would get a SessionExpiredException after a period of inactivity. Once we did some debugging, we discovered that the proxy has a default inactivity timeout for TCP connections of 60 minutes.

I know that activegraph uses a connection pool, which means it should automatically switch over to another connection when one is lost, but that looks like it's failing here. My best guess is that it's trying to switch to another connection but the proxy has closed all of the connections in the connection pool, and thus it's giving up.

The quick fix was to essentially ping the neo4j database with our periodic backend health check, which ensures that at least one connection stays alive. However, it seems that the driver should attempt to reconnect in this situation or otherwise handle it a bit more gracefully. Obviously there may be other considerations that I'm not privy to, but I wanted to bring it up.

Sorry I can't provide an easy way to reproduce this. Actual error we received:

I, [2020-10-27T00:20:45.672589 #1655]  INFO -- : [5f094f96-dd5b-44f1-8404-4cc8098dba0b] method=POST path=/graphql format=*/* controller=GraphqlController action=execute status=500 error='Neo4j::Driver::Exceptions::SessionExpiredException: code: `fff`, error: `4`, state: `4`, error_context: `plain_socket_send(/seabolt/src/seabolt/src/bolt/communication-plain.c:231), send error code: 32`' duration=39.56 view=0.00
F, [2020-10-27T00:20:45.674851 #1655] FATAL -- : [5f094f96-dd5b-44f1-8404-4cc8098dba0b]   
[5f094f96-dd5b-44f1-8404-4cc8098dba0b] Neo4j::Driver::Exceptions::SessionExpiredException (code: `fff`, error: `4`, state: `4`, error_context: `plain_socket_send(/seabolt/src/seabolt/src/bolt/communication-plain.c:231), send error code: 32`):
[5f094f96-dd5b-44f1-8404-4cc8098dba0b]   
[5f094f96-dd5b-44f1-8404-4cc8098dba0b] app/models/graph.rb:20:in `fetch_stats'
[5f094f96-dd5b-44f1-8404-4cc8098dba0b] app/graphql/queries/graph_stats.rb:7:in `resolve'
[5f094f96-dd5b-44f1-8404-4cc8098dba0b] app/controllers/graphql_controller.rb:15:in `execute'

Runtime information:

ruby version: 2.5.7 rails version: 6.0.2

Neo4j database version: 4.0.7 activegraph gem version: 10.0.1 neo4j-ruby-driver gem version: 1.7.2

mstrofbass avatar Nov 15 '20 18:11 mstrofbass

@mstrofbass have you tried to set the config value:

keep_alive: true

This instructs the neo4j server to send periodic NO_OP messages (not sure how often) to the client which the driver ignores. I understand that the pool should be refreshed with new connections if none is intact but that's seabolt that we do not maintain.

klobuczek avatar Dec 08 '20 20:12 klobuczek

I believe I tried that initially to no avail. There's no way for me to retest it at this point.

mstrofbass avatar Dec 12 '20 07:12 mstrofbass

@mstrofbass We encountered the same issue. Have you tried playing with the other configuration settings?

  config.neo4j.driver.keep_alive = false
  config.neo4j.driver.leaked_session_logging = true
  config.neo4j.driver.max_connection_lifetime = 1.minute
  config.neo4j.driver.max_connection_pool_size = 10
  config.neo4j.driver.connection_timeout = 30.seconds

jeperkins4 avatar Feb 03 '21 21:02 jeperkins4

We are getting the same error if the web app that uses neo4j via activegraph wasn't accessed for some time, e.g. in the morning. When reloading everything works again. We've tried setting keep_alive to true, but that did not help.

Ruby driver and activegraph 10.0.2, neo4j-community 4.0.11

hng avatar Oct 06 '21 14:10 hng