cider
cider copied to clipboard
ClassLoader streamlining and composability
As part of #3037 I'm trying to get a better sense of how classloaders are used across nREPL, Orchard, clojure.main, and elsewhere. I've been doing some deep diving into this topic, and have collected a bunch of helpers to help inspect and diagnose classloader related issues.
This is a meta issue where I will for starters try to better document the situation, and collect links to various issues and PRs that have been created over time.
There's a lot of getting/setting of the thread-local context classloader, and some binding of clojure.compiler.LOADER. This doesn't compose well, and issues are hard to diagnose. It's easy to end up in a situation where different parts see different classloaders. E.g. when you (require ...) it uses one, but when you do a var lookup in Orchard it uses an other one. It's also easy for one tool to set a classloader, which then gets replaced by a different one, or which no longer is accessible (because it has been replaced or because we're on a different thread) by the time it's supposed to be used.
Tools that currently do classloader tricks
- Pomegrenate
- lambdaisland.classpath
- dynapath
- nREPL sideloader
- nREPL session middleware
- nREPL interruptible eval
- clj-refactor
- Virgil
- boot
- enrich-classpath
Issues
- https://github.com/nrepl/nrepl/issues/8
- https://github.com/nrepl/nrepl/pull/185
- https://github.com/nrepl/nrepl/pull/190
- https://github.com/nrepl/nrepl/issues/206
- https://github.com/clojure-emacs/orchard/issues/20
- https://github.com/clojure-emacs/orchard/issues/103
- https://github.com/clojure-emacs/cider-nrepl/issues/482
And many more (cider-nrepl issues, nrepl issues, orchard issue).
My current thinking is that we need to evolve better patterns which are more explicit, and that we need to reduce the reliance on the "ContextClassloader". The latter being a thread-local and mutable field is where a lot of our predicament comes from.
Some initial guidelines (subject to change, discussion welcome):
- if you rely on a specific classloader to be present when loading resources, then allow that classloader to be passed in explicitly, or if that's not feasible then at least make it configurable via a var/atom/dynamic binding
- don't get/set ContextClassloader if you don't have to. If you need the classloader to be set in a certain scope then dynamically bind
clojure.lang.Compiler/LOADER - if you want the classloader that Clojure currently "sees" then use
clojure.lang.RT/baseLoader, this will useCompiler/LOADERorContextClassloaderdepending on which are set - if you need a
DynamicClassloaderto manipulate the classpath then find the "root" classloader, i.e. traverse parents until you find theDynamicClassloaderthat sits directly above the application classloader - be aware that when you do add libraries there (this is what
enrich-classpathoradd-libdo) then this breaksclojure.java.classpath/classpath(Jira CLASSPATH-9) - avoid
(try (.setContextClassloader ...) ... (finally (.setContextClassloader ...)))as a pattern to temporarily set the classloader, any contextclassloader changes that happen in between will be discarded. BindCompiler/LOADERinstead
be aware that when you do add libraries there (this is what enrich-classpath or add-lib do)
enrich-classpath doesn't use classloaders at all
I may be misunderstanding something. Isn't it enrich-classpath that adds src.zip? because that definitely ends up on the root classloader's path.
It doesn't do that at the moment, the day I get to implement it I wouldn't use classloaders either. Associating this library to https://clojure.atlassian.net/browse/CLASSPATH-9 is therefore misleading.
That issue is caused by Orchard's classloader-based adding of src.zip, which we currently intend to get rid of.
Ref: https://github.com/clojure-emacs/orchard/blob/ca5fae9f1a8292956a9f3d25cdb3dc160e4c80df/src/orchard/java.clj#L67
that definitely ends up on the root classloader's path.
that's a different topic, one thing is performing classloader tricks or breaking clojure.java.classpath/classpath; another is placing items in the classpath beforehand via e.g. a Lein plugin (as opposed to doing it at runtime).
Of course, adding things to the classpath will make those items reachable by classloaders; that's the whole point (and a fairly clean thing to do: items are placed at the tail of the classpath for minimum interference).
If you believe it's problematic feel free to expand on it, but please be mindful of conflating issues.
Other than that kudos for the writeup / initiative!
@plexus Thanks for summarizing the state of affairs. This was long overdue. Here a few thoughts from me:
dynapathshould go away from Orchard - it has caused us a ton of grief, after JDK 9 was released, and @vemv's work onenrich-classpathwill render it redundant thereclj-refactor/refactor-nreplcurrently don't do any classpath manipulation, if I remember correctly (as hotloading dependencies has been broken for a while); of course, this will change if the Pomegranate PR is completed/merged.- It seems to me that Boot is almost dead at this point, so we probably shouldn't worry that much about interactions with it, although if there are some known problems that we can solve on that front that'd be great.
- Orchard reimplements a part of
clojure.java.classpathto better fit the needs of tooling and to avoid a runtime dependency, but I think this part wasn't doing anything with classloaders.
The guidelines outlined by you make sense to me, and they should be fairly easy to adopt. I have to admit that in the early days of nREPL/CIDER I didn't give much thought to the ramifications of our approach, plus things worked reasonably well until JDK 9, which was a wake up call. 😆
Thanks for clarifying @vemv, it seems I did indeed misunderstand enrich-classpath.
one thing is performing classloader tricks or breaking clojure.java.classpath/classpath; another is placing items in the classpath beforehand via e.g. a Lein plugin (as opposed to doing it at runtime).
Agreed, manipulating the classpath outside of the JVM process is outside of the scope of what we are talking about here. Adding items at runtime is. Even if the code does not create or replace classloaders, it can make assumptions about the classloader state that don't always hold.
I did some more testing on this, and my main impression is that we should start getting rid of some of the noise.
clojure.main/repl adds a new DynamicClassLoader to the stack on every invocation, this is a long known issue. It really should check first if there is already a DynamicClassLoader as the current or an ancestor of the context classloader, and if not refrain from doing anything. I'm not sure if there's been an honest attempt to get something like this upstream, but I think we can start by making our own version of clojure.main/repl. I checked and there aren't too many private functions in there so I think we can copy just this function and make the necessary changes. This won't make a big difference in itself, but it will make it much easier to inspect and debug anything related to the classloader.
Similarly when cloning a session we create a new DynamicClassloader and set it on the session original PR. A thread inherits the context classloader from the thread it was forked from, and generally by that time the CCL has been set (we run a no-op :eval very early on), so if it's already there we don't need to set a new one. I briefly ran this by @cgrand and he didn't see a reason not to share the CL between threads, since they are stateless.
Then there's misc/with-session-classloader (used in sideloader, when loading print functions, and when loading deferred middleware) which sets the contextclassloader, then afterwards tries to "reset" it, which unless you are using the sideloader is so far always a double no-op, except that the second call risks undoing other classloader changes. So there I think we can also be more conservative and only set/reset the contextClassLoader when the session classloader differs from the current contextclassloader.
Even better would perhaps be to not touch the contextclassloader, and only bind Compiler/LOADER (which it also does). Clojure will give precedence to Compiler/LOADER if it is set (see clojure.lang.RT/baseLoader) so this should be enough, but probably isn't, because of code out there that calls getContextClassLoader when what they really want is the baseLoader...
I have a local PoC branch that I'm using to try to dogfood this stuff while working on real world projects. Will keep folks posted and submit the changes I feel most confident about in incremental PRs.
Maybe of interest, a debug log of what currently happens when you start nREPL and connect from cider (no cider-nrepl). Every two lines is a call to setContextClassLoader.
[thread] file:line:col
+ instance -> parent
nREPL starts, we immediately call setContextClassLoader 4 times for no reason.
This is because we load the built-in middleware via the dynamic loader. These
are calls inside dynamic-loader that use with-session-classloader.
[main] nrepl.middleware.dynamic-loader:: (user/cl-app)
No-op
[main] nrepl.middleware.dynamic-loader:: (user/cl-app)
No-op
[main] nrepl.middleware.dynamic-loader:: (user/cl-app)
No-op
[main] nrepl.middleware.dynamic-loader:: (user/cl-app)
No-op
Now the server is ready, and the client connects. You can see it immediately building up a chain of classlaoders.
[clojure-agent-send-off-pool-3] clojure.main:51:5 (user/cl-7a032ecb)
+ clojure.lang.DynamicClassLoader@7a032ecb -> app
[clojure-agent-send-off-pool-3] clojure.main:51:5 (user/cl-6cc5539d)
+ clojure.lang.DynamicClassLoader@6cc5539d -> 7a032ecb
First clone request
[nREPL-session-f1ab66a0-7753-4caa-bbc8-e740a81b0a1b] nrepl.middleware.session:211:25 (user/cl-4c481042)
+ clojure.lang.DynamicClassLoader@4c481042 -> 6cc5539d
[clojure-agent-send-off-pool-3] clojure.main:51:5 (user/cl-326d65b6)
+ clojure.lang.DynamicClassLoader@326d65b6 -> 6cc5539d
[clojure-agent-send-off-pool-3] clojure.main:51:5 (user/cl-38bed28e)
+ clojure.lang.DynamicClassLoader@38bed28e -> 326d65b6
Second clone request
[nREPL-session-8428b667-5ec3-4c27-9ac5-c745a03a376d] nrepl.middleware.session:211:25 (user/cl-16eeb3c4)
+ clojure.lang.DynamicClassLoader@16eeb3c4 -> 38bed28e
[nREPL-session-f1ab66a0-7753-4caa-bbc8-e740a81b0a1b] clojure.main:51:5 (user/cl-7c4e20bd)
+ clojure.lang.DynamicClassLoader@7c4e20bd -> 4c481042
[nREPL-session-8428b667-5ec3-4c27-9ac5-c745a03a376d] clojure.main:51:5 (user/cl-3b6cf511)
+ clojure.lang.DynamicClassLoader@3b6cf511 -> 16eeb3c4
[nREPL-session-8428b667-5ec3-4c27-9ac5-c745a03a376d] clojure.main:51:5 (user/cl-14b9e978)
+ clojure.lang.DynamicClassLoader@14b9e978 -> 3b6cf511
The user hasn't done anything yet and we have instantiated 9 classloaders and called setContextClassloader 13 times.
clojure.main/repl adds a new DynamicClassLoader to the stack on every invocation, this is a long known issue. It really should check first if there is already a DynamicClassLoader as the current or an ancestor of the context classloader, and if not refrain from doing anything. I'm not sure if there's been an honest attempt to get something like this upstream, but I think we can start by making our own version of clojure.main/repl. I checked and there aren't too many private functions in there so I think we can copy just this function and make the necessary changes. This won't make a big difference in itself, but it will make it much easier to inspect and debug anything related to the classloader.
Originally nREPL used its own REPL implementation to avoid problems like this one. I recall that after adopting clojure.main/repl Chas regretted this decision, but as clojure.main/repl more or less got the job done we never spent much time thinking about trying to improve it upstream or replacing it.
This was the comment from Chas I was thinking about - https://github.com/nrepl/nrepl/issues/8#issuecomment-342284589
Interesting observation today. futures are scheduled on the clojure.lang.Agent/soloExecutor, which is a fixed size threadpool. So calling future may create a new thread or reuse an existing thread. Threads inherit the ContextClassloader from the thread they are created from.
This means that when you call future, you don't know what he contextclassloader inside that future will be. It may be the one you currently have, it may be one that was the contextclassloader at some time in the past, or it may be one that was set during a previous execution of a future.
This also means that if any future/agent threads were created before clojure.main/repl was called, then these will have the application classloader set, not Clojure's DynamicClassLoader.
Hey @plexus , is this issue good to close by now?
Not sure what the current situation is, but I'm fine with closing this. I'm no longer actively following up on it.