ldapsdk icon indicating copy to clipboard operation
ldapsdk copied to clipboard

Automatically reconnect to an LDAP server that was down using RoundRobinServerSet & LDAPConnectionPool

Open andreimoga opened this issue 6 years ago • 1 comments

I'm using last version 4.0.9 and the scenario is like the following:

  • 2 LDAP servers
  • a connection pool created using RoundRobinServerSet & LDAPConnectionPool (new LDAPConnectionPool(new RoundRobinServerSet(addresses, ports), new SimpleBindRequest(user, pass), 10))
  • 2 execution threads that do some queries
  • use netstat tool to check if there exists connections for both LDAP servers (yes there exists 1 for each server because I have just 2 threads & 2 servers)
  • after a while stop second server
  • all connections (2 in this case) are at the first server (checked with netstat)
  • restart second server & use netstat to see if connection pool is re-balanced but not

I have partially solved it by using PruneUnneededConnectionsLDAPConnectionPoolHealthCheck with minAvailableConnections = 1, but is not super nice. Extending LDAPConnectionPool is out of discussion because is a final class. Also I have implemented 2 different LDAPConnectionPoolHealthCheck but both are not super nice because of missing information about inuse connections or to many connections opened by probing all servers using createIfNecessary = true

andreimoga avatar Feb 01 '19 10:02 andreimoga

The LDAP SDK doesn’t really try to monitor the health of the backend servers, so when a server goes down, it doesn’t keep checking it to see when it comes back up. As such, it’s not designed to do automatic rebalancing in this way, and we’re probably not going to add that feature. However, you can configure your connection pool so that it happens naturally within a time frame that you specify.

Whenever the LDAP SDK needs to establish a new connection, it will use the configured ServerSet to obtain it, and that connection will remain established until one of the following things happens:

  • The LDAP SDK decides that the connection is no longer valid through health checking
  • A client tells the LDAP SDK that the connection is no longer valid by calling a method like releaseDefunctConnection or replaceDefunctConnection
  • A client tells the LDAP SDK that the connection is no longer needed by calling a method like discardConnection or shrinkPool
  • The connection has been established for longer than its maximum connection age

In your case, you want to do two things:

  • You should make sure that the pool is configured with a maximum connection age, and also a maximum defunct connection age. The maximum connection age specifies how long a connection should remain established before the pool will throw it away and replace it with a newly established connection. The maximum defunct connection age is similar, but if a connection is closed because either the LDAP SDK thinks the connection is invalid or your code tells it that it’s invalid by releasing it as defunct, then the connection created to replace it can have a shorter maximum connection age.

  • Instead of the round-robin server set, you should be using the fewest connections server set. With the round-robin server set, the LDAP SDK just maintains a circular list of all of the servers, and whenever it needs to establish a new connection, it will just try each of the servers in order from that list until it successfully establishes the connection or until it exhausts all of the possibilities. In a case where one of the servers has gone down and all of the connections have migrated to the other server, then it’s going to take the round-robin server set a while to rebalance things evenly because it’ll take several rounds of waiting for the maximum connection age for things to get evened out. On the other hand, if you’re using the fewest connections server set, then the LDAP SDK will keep track of the number of connections it has established to each server and will try the one with the fewest number of connections established. In this case, if one of the servers goes down and all the connections get migrated to the other one, once the downed server comes back up, then all of the new connections will go to that one until the number of connections across both servers is equal again.

If you set up the connection pool in this way, then the number of connections should be rebalanced within a period of time equal to the maximum connection age after both servers are available again. If you want this to happen more quickly, then you can use a lower maximum connection age.

dirmgr avatar Feb 01 '19 18:02 dirmgr