tyrus icon indicating copy to clipboard operation
tyrus copied to clipboard

ClientManager/BaseContainer shutdown condition not thread safe => RejectedExecutionException

Open nuessgens opened this issue 7 years ago • 2 comments
trafficstars

In rare cases the ClientManager (or rather BaseContainer) shuts down the executor-service although there are active clients (activeClientCounter != 0)

In my test-scenario the application re-uses the ClientManger instance. The test tries to connect to a non existing server (which - of course - fails). As soon as the connect fails a new connect is made by calling ClientManager.asyncConnectToServer() again. There is no managed executor-service so the BaseContainer maintains it's own and tries to stop it if all sessions/clients where disconnected.

In some situations I get an RejectedExecutionException when calling ClientManager.asyncConnectToServer(). I noticed huge differences regarding the frequency (/reproducibility) of this behavior between different machines (e.g. Windows 10 desktop pc Intel Core i7 80 2.9 GHz Oracle JDK 1.8.0_121 and windows 10 notebook Intel Core i7-7700HQ 2.8Ghz Oracle JDK 1.8.0_131. It is easier to reproduce on the notebook.

I added some log-statements (async-logging) to the following classes. BaseContainer

protected void shutdown(ShutDownCondition shutDownCondition) {
        synchronized (EXECUTORS_CLEAN_UP_LOCK) {
            log.debug("Pre shutdown-evaluation ? " + shutDownCondition.evaluate());
            if (shutDownCondition.evaluate()) {
                log.debug("Going to shutdown " + executorService);
                shutdown();
                log.debug("Post shutdown-evaluation ? " + shutDownCondition.evaluate());
            }
        }
}

ClientManager

Future<Session> connectToServer(final Object o, final ClientEndpointConfig configuration, final String url,
                                    boolean synchronous) throws DeploymentException {

        final Map<String, Object> copiedProperties = new HashMap<String, Object>(properties);

        /* Client activity listener must be called before an executor service is obtained from the container, otherwise
         the counter of active client won't be increased and the executor service might be shut down */
        log.debug("onConnectionInitiated...");
        clientActivityListener.onConnectionInitiated();
        log.debug("...onConnectionInitiated");

        final ExecutorService executorService;
        if (synchronous) {
            executorService = new SameThreadExecutorService();
        } else {
            log.debug("Going to acquire exec");
            executorService = getExecutorService();
            log.debug("Got exec " + executorService);
        }
		
	[...]
}

In the rare cases mentioned above the following output is made:

2018-08-17 20:18:21,498 DEBUG [][tyrus-1-thread-537] org.glassfish.tyrus.core.BaseContainer - Pre shutdown-evaluation ? true
2018-08-17 20:18:21,498 DEBUG [][connect-t2] org.glassfish.tyrus.client.ClientManager - onConnectionInitiated...
2018-08-17 20:18:21,498 DEBUG [][connect-t2] org.glassfish.tyrus.client.ClientManager - ...onConnectionInitiated
2018-08-17 20:18:21,498 DEBUG [][connect-t2] org.glassfish.tyrus.client.ClientManager - Going to acquire exec
2018-08-17 20:18:21,498 DEBUG [][tyrus-1-thread-537] org.glassfish.tyrus.core.BaseContainer - Going to shutdown java.util.concurrent.ThreadPoolExecutor@617da596[Running, pool size = 1, active threads = 1, queued tasks = 0, completed tasks = 0]
2018-08-17 20:18:21,498 DEBUG [][connect-t2] org.glassfish.tyrus.client.ClientManager - Got exec java.util.concurrent.ThreadPoolExecutor@617da596[Running, pool size = 1, active threads = 1, queued tasks = 0, completed tasks = 0]
2018-08-17 20:18:21,498 DEBUG [][tyrus-1-thread-537] org.glassfish.tyrus.core.BaseContainer - Post shutdown-evaluation ? false

Here is the corresponding exception:

java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.FutureTask@6532da1d rejected from java.util.concurrent.ThreadPoolExecutor@617da596[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 1]
	at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047)
	at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823)
	at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369)
	at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:112)
	at org.glassfish.tyrus.client.ClientManager.connectToServer(ClientManager.java:488)
	at org.glassfish.tyrus.client.ClientManager.asyncConnectToServer(ClientManager.java:417)

IMHO the construct (shutdown evaluation criteria and locking) is not consistent. After T1 evaluated the shutdown-criteria (activeClientCounter.get()==0) another thread T2 could call activeClientCounter.incrementAndGet()/onConnectionInitiated() and get the executor-service by calling getExecutorService(). Because the executorService is not yet null no lock is acquired and the terminating executor-service is returned. After that T1 awakes and shuts down the executor and sets it to null. After that T2 kicks in again and tries to submit the actual connect-task.

nuessgens avatar Aug 17 '18 18:08 nuessgens

To simulate the Thread Scheduling madness one can add Thread.sleep() or Thread.yield() between shutting down the executor service and setting it to null: BaseContainer

public void shutdown() {
        if (executorService != null) {
            executorService.shutdown();
            for (int i = 0; i < 100; i++) {
                Thread.yield();
            }
            executorService = null;
        }

        if (scheduledExecutorService != null) {
            scheduledExecutorService.shutdownNow();
            scheduledExecutorService = null;
        }
}

Neither sleep nor yield do release the EXECUTORS_CLEAN_UP_LOCK (acquired in shutdown(ShutDownCondition)) but should simulate that the other thread was granted processing time and the current thread was put on hold.

Here is my test program (.txt -> .java): TyrusRejectBug.txt

nuessgens avatar Aug 17 '18 19:08 nuessgens

Furthermore due the RejectedExecutionException activeClientCounter is not decremented / onConnectionTerminated is never called. Therefore activeClientCounter remains > 0 forever and the auto-cleanup mechanism does not work anymore / exec will never be stopped

nuessgens avatar Aug 17 '18 19:08 nuessgens

I ran into the same RejectedExecutionException when connections are being fairly rapidly opened and closed. I was attempting to use a single WebSocketContainer for the same reason outlined in #669: otherwise approximately 11 threads are consumed per connection, most of which sit idle. It looks like #927 would mostly address that concern in a future release.

jglick avatar May 30 '25 22:05 jglick