grpc-java icon indicating copy to clipboard operation
grpc-java copied to clipboard

Getting `not a subtype` error on using compatible JARs

Open akanimesh7 opened this issue 1 year ago • 5 comments

What version of gRPC-Java are you using?

1.54.0

What is your environment?

openjdk version "11.0.17" 2022-10-18 OpenJDK Runtime Environment Temurin-11.0.17+8 (build 11.0.17+8) OpenJDK 64-Bit Server VM Temurin-11.0.17+8 (build 11.0.17+8, mixed mode)

What did you expect to see?

Basically i have this directory structure ..

random1 (plugin path for kafka connect)
-- plugin1
-- -- grpc-api:1.54.0
-- -- grpc-netty-shaded:1.54.0
random2 (some other directory, which is in class path but not plugin path)
-- grpc-api:1.54.0
-- grpc-netty-shaded:1.54.0
random3 (some other directory, which is in class path but not plugin path)
-- grpc-api:1.54.0
-- grpc-netty-shaded:1.54.0

Now when i run this plugin using kafka connect, i get this exception --

Caused by: java.util.ServiceConfigurationError: io.grpc.ManagedChannelProvider: io.grpc.netty.NettyChannelProvider not a subtype
	at java.base/java.util.ServiceLoader.fail(ServiceLoader.java:589)
	at java.base/java.util.ServiceLoader$LazyClassPathLookupIterator.hasNextService(ServiceLoader.java:1237)
	at java.base/java.util.ServiceLoader$LazyClassPathLookupIterator.hasNext(ServiceLoader.java:1265)
	at java.base/java.util.ServiceLoader$2.hasNext(ServiceLoader.java:1300)
	at java.base/java.util.ServiceLoader$3.hasNext(ServiceLoader.java:1385)
	at io.grpc.ServiceProviders.loadAll(ServiceProviders.java:67)
	at io.grpc.ManagedChannelRegistry.getDefaultRegistry(ManagedChannelRegistry.java:101)
	at io.grpc.ManagedChannelProvider.provider(ManagedChannelProvider.java:43)
	at io.grpc.ManagedChannelBuilder.forAddress(ManagedChannelBuilder.java:39)
	at com.google.cloud.bigtable.grpc.BigtableSession.createNettyChannel(BigtableSession.java:458)
	at com.google.cloud.bigtable.grpc.BigtableSession$3.create(BigtableSession.java:413)
	at com.google.cloud.bigtable.grpc.io.ChannelPool.<init>(ChannelPool.java:248)
	at com.google.cloud.bigtable.grpc.BigtableSession.createRawDataChannelPool(BigtableSession.java:416)
	at com.google.cloud.bigtable.grpc.BigtableSession.<init>(BigtableSession.java:256)
	at org.apache.hadoop.hbase.client.AbstractBigtableConnection.<init>(AbstractBigtableConnection.java:123)
	at org.apache.hadoop.hbase.client.AbstractBigtableConnection.<init>(AbstractBigtableConnection.java:88)
	at com.google.cloud.bigtable.hbase2_x.BigtableConnection.<init>(BigtableConnection.java:56)

On analyzing this i found out that there is a loadAll function in ServiceProviders.java. What it does is basically find all the resources using ServiceLoader.load(klass, cl). This loads all the service providers, so in my case providers from all the directories will be loaded.

I'm not able to understand that when the version of all the jars is same, why do i see this exception ? This stems from the fact that in the java's ServiceLoader:1237 . There is this check if (service.isAssignableFrom(clazz)) { and if this fails then this error is received. But why is the isAssignableFrom function returning false ? These are providers of the same service right ?

if this code only loads the service provider inside the plugin directory, since we are using the pluginClassLoader, then it will be the most favorable case. But for some reason it's finding out all the providers and then storing them.

Ref -- PluginClassLoder

akanimesh7 avatar Mar 07 '24 05:03 akanimesh7

Class loaders instantiate a .class file to a runtime class. If you use two class loaders to instantiate the same .class file multiple times, it produces two completely separate in-memory Class instances that are completely unrelated to each other as far as the JVM is concerned. If you create instances of those classes and try to exchange the instances between the two class loaders, it will fail with class cast exception, or like here with a "not a subtype".

The solution is "don't exchange instances between code loaded in different class loaders" or being very careful about it, like exchanging Runnable instances, since those are part of the JDK so the class loaders normally share the same Class instance.

ejona86 avatar Mar 07 '24 19:03 ejona86

Hey @ejona86 Thanks for your insightful response.

My two cents here would be --

The function ServiceLoader.java#nextProviderClass gets called from this hasNextService function. What happens here is the line loader.getResources(fullName). Here fullName will be META-INF/services/io.grpc.ManagedChannelProvider, now the output of loader.getResources(fullName) will be the URLs for all the service files in whichever directory (plugin dir, random1, random2 or wherever). The reason is getResources function looks for the resource in all the parent class loaders also.

Given the above, now in the function ServiceLoader.java#hasNextService we can see that ther's a check if (service.isAssignableFrom(clazz)). Now since in grpc-java we are loading all the service providers, what ends up happening is -- First Iteration --> the clazz var is the service provider class from the plugin directory. Here this check returns true and code moves ahead. Second iteration --> the clazz var is the service provider class from random1 directory. Here this check returns FALSE.

Context -- the service file in plugin dir contains only one class, which is also present in plugin dir. Hence iteration 1 works fine. But in second iteration, it tries to load a classname that's found from service file present not in plugin dir. Now this classname can very well be something other than the one present in the plugin dir. Hence when loaded, this will be loaded from the parent of plugin class loader.

This can be justified by @ejona86 's comment just before this. Since in second iteration, the class loaded was not from plugin class loader, rather from it's parent. But the service var is still the one loaded from plugin class loader, since it's present in the plugin directory. Hence this is returning false.

Questions --

  1. Does the above make sense ?
  2. If yes, for solving this can we not load all the classes and rather use the first one ? This will conclude the issue.
  3. Ideally in the java code, service.isAssignableFrom(clazz), clazz is being loaded everytime with new classnames, written in multiple service files in different JARs (in plugin dir, random1, random2, etc). So if the provider mentioned in some file is not there in the plugin dir, it's loaded using parent class loader from different directory. But service param here is always present in plugin dir and hence always found. This looks like a gap ??

akanimesh7 avatar Mar 08 '24 15:03 akanimesh7

Second iteration --> the clazz var is the service provider class from random1 directory. Here this check returns FALSE.

If random1 has gRPC, then plugin should either not have gRPC (to use random1's copy), shade gRPC (rename it, so you can have two copies), or have the classloader hide random1 from plugin.

If yes, for solving this can we not load all the classes and rather use the first one ?

We need all instances. The ordering is arbitrary.

This looks like a gap ??

I wasn't able to follow what you think the issue is there.

The service file should reference classes in the same jar.

ejona86 avatar Mar 08 '24 20:03 ejona86

I wasn't able to follow what you think the issue is there.

So basically consider the following directory structure --

pDir (plugin directory)
	grpc-api-1.39.0.jar
	grpc-netty-shaded-1.39.0.jar
kDir
	grpc-api-1.54.0.jar
	grpc-netty-1.54.0.jar
	grpc-netty-shaded-1.54.0.jar
mDir
	grpc-api-1.54.0.jar
	grpc-netty-1.54.0.jar
	grpc-netty-shaded-1.54.0.jar	

and these are the contents of these plugins

JARs containing service ... 

grpc-api-1.39.0.jar
	io.grpc.ManagedChannelProvider
grpc-api-1.54.0.jar
	io.grpc.ManagedChannelProvider
	
JARs containing service providers ...
	
grpc-netty-shaded-1.39.0
	META-INF/services/io.grpc.ManagedChannelProvider
		Contents --
			io.grpc.netty.shaded.io.grpc.netty.NettyChannelProvider			
grpc-netty-1.54.0.jar
	META-INF/services/io.grpc.ManagedChannelProvider
		Contents --
			io.grpc.netty.NettyChannelProvider
			io.grpc.netty.UdsNettyChannelProvider
grpc-netty-shaded-1.54.0.jar
	META-INF/services/io.grpc.ManagedChannelProvider
		Contents --
			io.grpc.netty.shaded.io.grpc.netty.NettyChannelProvider
			io.grpc.netty.shaded.io.grpc.netty.UdsNettyChannelProvider

Now, since plugin directory contains grpc-netty-shaded-1.39.0.jar and the service file in this one has io.grpc.netty.shaded.io.grpc.netty.NettyChannelProvider in it's contents, so clazz would be this class loaded with PluginClassLoader. The check if service.isAssignableFrom(clazz) would return true here since service and clazz both are loaded using PluginClassLoader, since both are available in the Plugin directory.

Whereas, if the service file chosen is from the kdir's jar -- grpc-netty-1.54.0.jar then, the service provider mentioned in that file is io.grpc.netty.NettyChannelProvider . Now if clazz is created using this classname, then PluginClassLoader would not be able to find it and hence delegate to it's parent. So essentially clazz is loaded using some other loader than PluginClassLoader, and service is still the one loaded with PluginClassLoader, since it's present in the plugin directory (in grpc-api-1.39.0.jar). Hence the isAssignable check would return FALSE here.

So what i was saying is it looks like a gap here ??

akanimesh7 avatar Mar 09 '24 16:03 akanimesh7

Laying out the directories and files is really helpful, but I'm not clear on how the classloaders are organized.

If there is a ClassLoader that transitively contains any two of pDir, kDir, mDir, then it is dangerous and should be fixed. It sounds like the plugin's class loader delegates to kdir. That is broken. If the class loader is working correctly, then you are not allowed to add grpc to the plugin directory. If you want plugins to have their own copy of grpc, then the plugin classloader should never expose grpc from kdir. Servlet classloaders, for example, are generally good about not exposing servlet container internals into servlets' classloaders.

A single ClassLoader can have both kDir and mDir in its classpath, because it will only load classes once and the classes are identical. But that's only good as long as the jars are identical, so it is generally a bad idea.

ejona86 avatar Mar 12 '24 01:03 ejona86

No response to provide more information, so closing. It would be easy for there to be a ClassLoader issue; the ecosystem isn't great about writing ClassLoaders. But we can still help direct how to avoid most common issues. More information can be provided and then we can reopen.

ejona86 avatar Mar 22 '24 21:03 ejona86