jcuda-main icon indicating copy to clipboard operation
jcuda-main copied to clipboard

Loading CUDA binaries

Open blueberry opened this issue 4 years ago • 4 comments

Hi Marco,

When JCuda tries to load binaries, can it be "tricked" to load CUDA binaries from a 3-rd party JAR distribution? bytedeco's JavaCPP CUDA has an optional JAR distribution of all CUDA versions.

I use this "trick" for MKL. Neanderthal-native is a hand-written JNI wrapper for MKL, and, normally, it loads the system-wide installation MKL libraries. However, some users prefer to have a all-JVM option. Here's what works:

  1. add a desired javacpp mkl binary distribution jar onto classpath.
  2. before anything else, try to load a javacpp class related to the binaries. It will trigger the loading of the binaries from the jar instead of system
  3. when my loading code, unaware of javacpp, tries to load the system binaries, it sees that the same binaries have already been loaded, and just use these I didn't have to change any of my loading code (I use maven nar plugin for that). It simply works as above.

Is it possible to do the same for JCuda? That might be a good solution for the problem of "one app needs CUDA 10.2, and another is compiled for 11.0"...

blueberry avatar Sep 19 '20 10:09 blueberry

It would certainly be desirable to be more version-agnostic. The 1:1-correspondence between the JCuda version and the CUDA version is the reason why JCuda can, in practice, only be used in a very controlled environment. This means that it's practically impossible to deploy JCuda to end-users, unless they are forced to install exactly the right CUDA version...

I'm not sure whether I understood the approach that you're using for MKL correctly, so please correct me if I'm wrong:

Very roughly roughly speaking, the JavaCPP classes have some magic System.loadLibrary("mkl") in one of their class initializers. When the class is loaded, System.loadLibrary("mkl") is called. This loads the real, actual MKL implementation library (let's call it MKL.DLL). Then you can use System.loadLibrary("neanderthal-mkl");, which loads your neanderthal-mkl.DLL, and rely on the fact that the underlying MKL implementation was already loaded, right?

And if I understood this correctly, then the main benefit is that the user does not have to install the real, actual MKL implementation, and does not have to think about where to place the MKL.DLL, because was "magically" loaded by the JavaCPP class, right?

Some questions that arise when trying to apply a similar approach to JCuda:

  • Where is the actual MKL implementation located when it is loaded by JavaCPP? Is there some javacpp-mkl.jar that contains the MKL.DLL and MKL.SO implementation libraries? (EDIT: This is answered, to some extent: I had a look at the JavaCPP distribution, and they do contain the JARs with the DLLs - But I'll have to take a closer look at this, specifically how they do this for CUDA)
  • How do they handle versioning? Specifically: How do you make sure that the MKL.DLL that you are loading via JavaCPP is the right one that matches your neanderthal-mkl.dll?
  • How large is the implementation, i.e. the actual MKL.DLL?

(Maybe not all of them are relevant, if I understood something wrong)

I already considered packing the real CUDA libraries into the JARs, so that they can be loaded as necessary. But this raises several possible killer questions:

  • Is this compatible with the CUDA licensing model? I'm pretty sure it is not. NVIDIA is very restrictive here (consider the fact that you have to register there to even just download cuDNN. Did you read the "Terms And Conditions" that you agreed to when downloading it? Sure, neither did I ;-) ). I once tried to find reliable information, and think that only the "cudart_..." library can be distributed to end users this way. But I am not a lawyer...
  • How well does this work for CUDA, considering the strong connection between CUDA and the underyling Graphics Card driver? For example, I assume that if someone installs a Graphics Card Driver in a certain version, and this version supports CUDA 10, then trying to load the CUDA 11 binaries (even if they are contained in the JAR) will fail. When installing the CUDA toolkit (at least on Windows), it always installs a "matching" Graphics Card Driver. And I think that the CUDA DLLs expect a certain functionality from the DLLs that are installed with the driver. (I'm not sure - I would have to try it out, by trying to depoly a CUDA 11 application (which may just be the "device query" example, as an executable) to a PC where an "old" (CUDA 10) driver is installed)
  • How large is the result? Seriously, this could be a killer argument. The DLLs in the CUDA/v11.0/bin directoy for me right now have ~1.5 Gigabytes, and the cuDNN DLLs have another Gigabyte. This is one version, for one operating system. Imagine supporting 5 versions, for 2-3 operating systems, and you'll quickly end up juggling with a few dozen Gigabytes of JAR files. (I'm not gonna upload that to Maven Central, that's for sure...)

Again, it would be really desirable to improve the flexibility here, and make it easier (or rather: possible) to deploy JCuda-based applications to different target machines. But I have strong doubts that this is feasible in practice...

jcuda avatar Sep 19 '20 18:09 jcuda

Very roughly roughly speaking, the JavaCPP classes have some magic System.loadLibrary("mkl") in one of their class initializers. When the class is loaded, System.loadLibrary("mkl") is called. This loads the real, actual MKL implementation library (let's call it MKL.DLL). Then you can use System.loadLibrary("neanderthal-mkl");, which loads your neanderthal-mkl.DLL, and rely on the fact that the underlying MKL implementation was already loaded, right?

Yes. Maybe a bit less involved.

And if I understood this correctly, then the main benefit is that the user does not have to install the real, actual MKL implementation, and does not have to think about where to place the MKL.DLL, because was "magically" loaded by the JavaCPP class, right?

Yes.

  • Where is the actual MKL implementation located when it is loaded by JavaCPP? Is there some javacpp-mkl.jar that contains the MKL.DLL and MKL.SO implementation libraries?

Something like that, and it is even better, since they distribute all versions through maven repository (so you can pick and choose), and the jars for javacpp-mkl (JNI, java, etc.) and the 700MB distribution jars (per OS) are separate. So, basically

  • How do they handle versioning? Specifically: How do you make sure that the MKL.DLL that you are loading via JavaCPP is the right one that matches your neanderthal-mkl.dll?

Neanderthal is built for mkl_rt, which in turn loads whatever is appropriate for the hardware (AVX 512 etc). Since MKL api is rather stable, any version will work. But, you of course can control the version since javacpp distributes different version in jars, so you'd pick somethihng like "1.5.4-2020.3".

  • How large is the implementation, i.e. the actual MKL.DLL?

Quite large.

(Maybe not all of them are relevant, if I understood something wrong)

I already considered packing the real CUDA libraries into the JARs, so that they can be loaded as necessary. But this raises several possible killer questions:

  • Is this compatible with the CUDA licensing model? I'm pretty sure it is not. NVIDIA is very restrictive here (consider the fact that you have to register there to even just download cuDNN. Did you read the "Terms And Conditions" that you agreed to when downloading it? Sure, neither did I ;-) ). I once tried to find reliable information, and think that only the "cudart_..." library can be distributed to end users this way. But I am not a lawyer...

I'm not sure, but:

  1. The distribution is done by javacpp and official maven repositories. I guess that it's either OK, or at least they're on the hook for that.
  • How well does this work for CUDA, considering the strong connection between CUDA and the underyling Graphics Card driver? For example, I assume that if someone installs a Graphics Card Driver in a certain version, and this version supports CUDA 10, then trying to load the CUDA 11 binaries (even if they are contained in the JAR) will fail. When installing the CUDA toolkit (at least on Windows), it always installs a "matching" Graphics Card Driver. And I think that the CUDA DLLs expect a certain functionality from the DLLs that are installed with the driver. (I'm not sure - I would have to try it out, by trying to depoly a CUDA 11 application (which may just be the "device query" example, as an executable) to a PC where an "old" (CUDA 10) driver is installed)

The user would be responsible for using the "right" version. Here's what I do for Neanderthal:

  1. Neanderthal only cares that mkl is on the path. How it's provided is not my business
  2. If the user provides a javacpp-mkl-redist on the classpath (through maven or other means), Neanderthal would pick that up
  3. If not, the system-wide MKL will be picked up. Is there's no system-wide MKL, the load fails.
  • How large is the result? Seriously, this could be a killer argument. The DLLs in the CUDA/v11.0/bin directoy for me right now have ~1.5 Gigabytes, and the cuDNN DLLs have another Gigabyte. This is one version, for one operating system. Imagine supporting 5 versions, for 2-3 operating systems, and you'll quickly end up juggling with a few dozen Gigabytes of JAR files. (I'm not gonna upload that to Maven Central, that's for sure...)

The point is exactly to avoid that by providing javacpp's relevant cuda redistribution in maven pom by the users, in their projects. If they do not provide that, system-wide CUDA should be used.

Please see here how these jars look like https://repo1.maven.org/maven2/org/bytedeco/cuda/11.0-8.0-1.5.4/

javacpp's CUDA distribution is basically a zipped folder with CUDA DLLs and a tiny mechanism to unpack these in temp folder and load them appropriately.

blueberry avatar Sep 19 '20 18:09 blueberry

Something like that, and it is even better, since they distribute all versions through maven repository

I see. Again, very roughly speaking, the goal would be that the end user just adds https://mvnrepository.com/artifact/org.bytedeco/cuda/11.0-8.0-1.5.4 to the dependencies, then does some small, magic call that causes some "JavaCPP CUDA root class" to be loaded (implicitly loading the natives for the target OS), and then load one of the JCuda classes, which will load the jcuda-native, which in turn picks up the (already loaded) CUDA libraries.

(This would essentially avoid the "UnsatisfiedLinkError: Can't find dependent libraries" that otherwise would occur when the matching CUDA libraries are not in the PATH. The "implicit fallback" of using the system-wide libraries that you described for Neanderthal+MKL should work transparently here)

In general, that sounds like an approach that could really make the deployment easier. And if I understood this correctly, then the versioning issue should be solved automatically, by making sure that the Maven Dependency versions match (i.e. JCuda 11.0.0 and JavaCPP CUDA 11.0-8.0-1.5.4).

I think the best path to try this out would be with JCudnn: There, the actual cuDNN DLLs still have to be copied into the project directory, manually, by the user, because there is no installer or standard location for them (or their directory has to be added to the PATH, also manually).

I'll try to allocate some time to see whether this could work for JCuda.

jcuda avatar Sep 21 '20 16:09 jcuda

I gave this a shot. And ... I'm somewhat surprised that, during my first, very basic test ... this actually seemed to work! :-D

I have created an example project at https://github.com/jcuda/jcuda-javacpp-example

It uses JCuda 10.1.0 (and I have currently CUDA 11.2 installed). Usually (and I tried this out), starting the sample program would cause our beloved "cannot find dependent libraries" error, due to the wrong version. But when using the JavaCPP classes to preload the CUDA base library (and nvRTC for the sample) with version 10.1, it works.

@blueberry You might want to have a look at the POM, to see whether this is what you had in mind, or whether you notice any issues. I have essentially bundled the functionality for preloading the libraries in a small helper class.

BTW: I'm blatantly exploiting the library loading mechanisms of JavaCPP here. Although I had a look at the JavaCPP Loader class, and it essentially does ~"something similar" as my LibUtils, it also does much more (or as its JavaDoc says: "... a bit of everything that does not fit anywhere else"), so I didn't understand the process in all depth. It's basically deriving the name of the library from the annotations, like

@Properties(inherit = cudart.class, value = {
    @Platform(include = "<cudnn.h>", link = "[email protected]"),
    @Platform(value = "windows-x86_64", preload = "cudnn64_7")},
        target = "org.bytedeco.cuda.cudnn", global = "org.bytedeco.cuda.global.cudnn")

of the "presets" classes, but I haven't looked at all details yet.

jcuda avatar Dec 24 '20 22:12 jcuda