okhttp icon indicating copy to clipboard operation
okhttp copied to clipboard

Support clean shutdown of TaskRunner threads and documentation

Open damien-urruty-sonarsource opened this issue 4 years ago • 26 comments

Hello,

I am precisely in a situation where I need to shutdown a client manually.

We are developing an IntelliJ plugins that uses OkHttp (https://sonarlint.org/). Since latest versions of IntelliJ, plugins can be installed and uninstalled without restarting the IDE. It means that when plugin is uninstalled we need to unload all the classes we use, and OkHttp is part of that.

I followed the guidelines here: https://square.github.io/okhttp/4.x/okhttp/okhttp3/-ok-http-client/#shutdown-isnt-necessary, and I use the latest version (4.7.2).

It appears that the steps described in this page are not sufficient. I can still see an instance of TaskRunner and RealBackend, that retains an ThreadPoolExecutor. I had to call also:

((TaskRunner.RealBackend)TaskRunner.INSTANCE.getBackend()).shutdown();

Which is a bit hacky, but this correctly released the executor.

I think this should be at least documented, or maybe improved with a better API. The shutdown method could possibly be defined in the Backend interface.

We probably do need some public API to do this, but it's awkward because it's a singleton. So you might kill another user of OkHttp in the same process.

Throwing out some ideas

  1. TaskRunner.shutdown() or TaskRunner.INSTANCE.executor()...
  2. Avoid any singleton e.g. use per root client?
  3. Allow OkHttpClient.Builder to opt out of using the Singleton, and avoid building in that case.

yschimke avatar Jul 07 '20 16:07 yschimke

@swankjesse Are any of the solutions above workable for you? Public shutdown API, avoid a singleton TaskRunner completely, or only start the TaskRunner singleton on first use and provide opt out?

yschimke avatar Jul 16 '20 06:07 yschimke

I don't like exposing implementation details like these because they change. For example, when Loom comes we'll use virtual threads and this won't work the same way.

Lemme try to find something. In the interim, use an internal API?

swankjesse avatar Jul 16 '20 10:07 swankjesse

Hey, thanks for your response!

As I said, I already have a workaround with the one-liner I provided above, so there is no urgency. A similar issue in Okio I referenced above is more problematic because I have no workaround: https://github.com/square/okio/issues/738

I agree that having an API exposing implementation details would not be the right approach.

As this TaskRunner is triggered when I use the HTTP client API, I would expect to have a way to shut it down from the same API, e.g. on OkHttpClient, or maybe be able to pass a specific executor in the builder API ? But in the end it might mean not relying on a singleton

Both Okio’s watchdog and TaskRunner do release their threads after 60 seconds. How do you feel about a system property to shorten that lifetime? Maybe something like this?

System.setProperty("okio.threadpool.keepalivemillis", "3000")

We’d use this property for both Okio and OkHttp. If it’s set we honor it, otherwise we do 60 seconds as today.

This is a bit loose and it’s potentially something we could break! But its a singleton which lets us share threads between OkHttpClient instances.

swankjesse avatar Jul 17 '20 00:07 swankjesse

@swankjesse That's why I favour this approach

Allow OkHttpClient.Builder to opt out of using the Singleton, and avoid building in that case.

Maybe in combination with above, allow a client to be configured not to share the thread pool. But the system property seems like the minimal fix here.

yschimke avatar Jul 20 '20 05:07 yschimke

Also the advantage of having an API is that each user can choose which settings to apply. If we have a global system property, then it is shared by every user.

Anyway your decision will be mine!

Either way suggest two PRs. So submit the one that Jesse suggested first.

yschimke avatar Jul 20 '20 09:07 yschimke

The Okio watchdog is an process-wide singleton and there's no place to configure it.

swankjesse avatar Jul 20 '20 11:07 swankjesse

Ok, let me try, I will open a PR

What about a public API

/**
 * Attempt to close any resources held by OkHttp outside client instances.  After cleanly closing all calls and clients
 * in the JVM, this method will succeed. However if any connections are still held open this may fail due to threads that are still in use.
 */
fun OkHttp.tryShutdown(): boolean

Basically non destructive but releases idle resources and returns a boolean whether it was successful? We could use it as a strong hint, and avoid either corrupting OkHttp if it's still in use, or configuring OkHttp suboptimally.

yschimke avatar Aug 08 '20 07:08 yschimke

What is the behavior with the executor service in dispatcher in that method?

JakeWharton avatar Aug 08 '20 13:08 JakeWharton

That one is already public. Frameworks controlling their own OkHttp client instances can already provide their own implementation of the executor service, or access the default (not linked to TaskRunner). So nothing, not behaviour on that executor service.

They can also close their clients, connections and calls.

But TaskRunner is private and a singleton and this shutdown will likely fail if you haven't cleanly closed everything else first, at which point you could look at what is still open.

yschimke avatar Aug 08 '20 14:08 yschimke

I see, you're proposing an unstable API for only internal things?

I would much prefer a public shutdown API that does everything. I'm not convinced the current position on shutdown is a good one.

JakeWharton avatar Aug 08 '20 14:08 JakeWharton

Sorry, I meant a simple single public singleton method to try shutdown and report whether it was successful.

I'm throwing out ideas, mainly because there are a few cases where it's hard to easily and cleanly meeting various needs. But ideally I'd still prefer that we make the defaults generally work, and having a logical additional step to cleanup e.g. override your OkHttp instance to not use the TaskRunner singleton.

yschimke avatar Aug 08 '20 15:08 yschimke

Any news on this? I'm currently running on a GraalVM Native Image this lib https://github.com/minio/minio-java that uses OkHttp. And I would really like OkHttp not to hang my Isolates for 60s with TaskRunner and OkIO Watchdog threads...

davidfrickert avatar Jun 07 '21 21:06 davidfrickert

I thought this particular issue you raised is fixed now after 4.5 https://github.com/square/okhttp/issues/5832

I don't have issues with OkHttp clients + GraalVM + shutdown now that client is using daemon threads

yschimke avatar Jun 08 '21 03:06 yschimke

Hmm, in my case my Isolates stay up for 60s due to TaskRunner and OkIO Watchdog not exiting...

Example code:

private static String run(MinioClient minioClient, int seed, byte[] buffer) throws (...) {
		InputStream stream = minioClient.getObject(GetObjectArgs.builder()
																.bucket("files")
																.object(String.format("file-%d.dat", seed))
																.build());
		for (int bytesread = 0;
			bytesread < size;
			bytesread += stream.read(buffer, bytesread, size - bytesread));
		stream.close();
		return DatatypeConverter.printHexBinary(MessageDigest.getInstance("MD5").digest(buffer));
	}

Could I not be closing something? Or the library that uses OkHttp underneath (Minio java), not be closing something?

Is the behaviour of having daemon threads wait for 60s normal? This is the core problem that I'm having.

davidfrickert avatar Jun 08 '21 09:06 davidfrickert

@davidfrickert these are daemon threads. They shouldn't keep your process alive on their own.

swankjesse avatar Jun 08 '21 11:06 swankjesse

@swankjesse Thanks for your response.

Well, from what I'm seeing it seems to be blocking, due to the logic implemented on Native Image isolate destruction code.

https://github.com/oracle/graal/issues/2617#issuecomment-652371063

They try to interrupt all threads currently executing (including daemon), and wait for them to exit to be able to successfuly close the Isolate and free all the memory. Daemon threads block the successful cleanup if they don't exit on the interrupt - there is no distinction here between Daemon / Normal threads, all must stop, so, the Isolate memory stays alive for 60s - the configured max idle time on these daemon threads of OkHttp/OkIO, which is quite unfortunate. Not sure what to do here to fix this, but I guess I might have to drop OkHttp if the cause is really the daemon threads not cleaning up.

davidfrickert avatar Jun 08 '21 23:06 davidfrickert

Is isolates different than processes here? We want to support GraalVM here, so it's worth seeing how we can fix this, so I want to understand the difference here.

yschimke avatar Jun 09 '21 03:06 yschimke

Yeah @yschimke I'm still a bit new in the concepts so I'm not sure I can adequately explain the insides of it.

They are a concept that enables a program to allocate memory in a strictly isolated heap (from the main program heap, and other Isolates' heap), but they run in the same process afaik.

This Medium post is quite nice explaining it: https://medium.com/graalvm/isolates-and-compressed-references-more-flexible-and-efficient-memory-management-for-graalvm-a044cc50b67e

GraalVM native images now support isolates (multiple independent VM instances in the same process)

davidfrickert avatar Jun 09 '21 09:06 davidfrickert

Thanks for links. They seem close enough to a concept from HHVM (Facebook PHP). So are you reusing the same process with multiple isolates? Or it's still 1:1 with process lifetime?

yschimke avatar Jun 09 '21 09:06 yschimke

No worries! Yeah reusing the same process, afaik, just allocating a Thread to the Isolate for the code that runs in its context.

davidfrickert avatar Jun 09 '21 09:06 davidfrickert

Fix will be to exit on interrupt, then launch a new thread if new work is enqueued.

swankjesse avatar Jun 09 '21 11:06 swankjesse

Thanks @swankjesse, created issue #6702 to address the possible discussion on it.

davidfrickert avatar Jun 11 '21 15:06 davidfrickert

Closing as low priority. https://github.com/square/okhttp/issues/6702 covers the native image use case.

yschimke avatar May 21 '23 08:05 yschimke