whisper.cpp icon indicating copy to clipboard operation
whisper.cpp copied to clipboard

Dynamic CUDA driver loader

Open didzis opened this issue 1 year ago • 6 comments

This PR implements optional dynamic CUDA driver loader and static linking against CUDA runtime.

As a result CUDA enabled binaries can run without recompilation on systems with or without CUDA supported GPUs (and CUDA driver) with fallback to alternative computation methods.

didzis avatar Feb 07 '24 15:02 didzis

Hi @didzis this is a great idea and I'm giving it a try now on Windows but without success. I built the Windows DLL with -DWHISPER_CUBLAS=1 -DWHISPER_DYNAMIC_CUDA=1 using CUDA 11.8. On a system with CUDA installed it works, but on a system without CUDA it fails to load the DLL due to a missing dependency on nvcuda.dll (which it is not a redistributable file). This is normally installed by CUDA in c:\windows\system32. I couldn't find where in the makefile this is getting linked to whisper.dll though. Any suggestions or maybe something I did wrong? Thanks!

jettoblack avatar Feb 09 '24 23:02 jettoblack

Hi, this was implemented only for non-Windows systems, but I made an attempt to support Windows platform in the latest commit. I don't have any means to test it myself. You may need to change the driver DLL name. Note that, there is a comment stating that there is no static cuBLAS library available since CUDA Toolkit 12.3.1 and thus static linking for cuBLAS is disabled. If this approach works for you, then some version check for older CUDA Toolkits may solve this. It should work as is with the dynamic cuBALS library, just that dynamic linking against any CUDA library defeats the purpose of all this.

didzis avatar Feb 10 '24 08:02 didzis

@ggerganov, here it is possible to embed the contents of cuda-loader.c into ggml-cuda.cu - tested, it works.

didzis avatar Feb 10 '24 09:02 didzis

My goal in the long term to address this is to move the backends to dynamic libraries loadable at run time, then we could use a single build for all the backends. I don't think this is going to work on Windows for the reasons already mentioned, some CUDA libraries do not have static versions in Windows, so the executable will depend on the CUDA dlls regardless.

slaren avatar Feb 11 '24 12:02 slaren

Ok, to me it seems better to aim for the more general solution and for now not merge this change.

ggerganov avatar Feb 11 '24 15:02 ggerganov

I didn't want to step into Windows realm with this PR as it was intended a Linux only feature. Thus I reverted this PR to Linux only solution.

Also I checked multiple CUDA Toolkit Windows releases and unfortunately it is the case mentioned before - cuBLAS static libraries are missing from Windows release.

The general solution mentioned above is great, however there are some disadvantages with it:

  • for CUDA it still requires the CUDA Toolkit (a compatible version) to be installed on the target machine;
  • the dynamic libraries must be distributed along with the binary; it's especially noticeable in cases where static libwhisper is embedded into a binary of another application having it's own distribution requirements;
  • the feature is not ready yet, still work to be done.

With this PR the CUDA code is made optional by dynamically load only the libcuda.so (if present), which is part of NVIDIA kernel drivers package, thus no CUDA Toolkit is required and there is no interference with possibly any already installed incompatible tooklit version. Some minimum libcuda.so driver version is required, but that depends on the version of the CUDA Toolkit static libraries used for linking the application. To maximize the coverage, an older CUDA Toolkit can be used.

A quote from the NVIDIA documentation here:

Note that in the latter case, the library cuda is not needed. The CUDA Runtime will try to open explicitly the cuda library if needed. In the case of a system which does not have the CUDA driver installed, this allows the application to gracefully manage this issue and potentially run if a CPU-only path is available.

The static cuBLAS library itself does the same - loads libcuda.so dynamically if needed and available.

Although there are no native cuBLAS static library for Windows available, CUDA can be used with Windows Subsystem for Linux 2 which is a Linux system and this PR still applies out-of-box:

The latest NVIDIA Windows GPU Driver will fully support WSL 2. With CUDA support in the driver, existing applications (compiled elsewhere on a Linux system for the same target GPU) can run unmodified within the WSL environment. ... Once a Windows NVIDIA GPU driver is installed on the system, CUDA becomes available within WSL 2. The CUDA driver installed on Windows host will be stubbed inside the WSL 2 as libcuda.so, therefore users must not install any NVIDIA GPU Linux driver within WSL 2.

I understand that there are no other options left for native Windows applications, but I fail to see any reason not to have both approaches supported for Linux platform (or WSL 2 on Windows).

@ggerganov I believe it's worth to still consider merging this optional (and small) feature in one form or another (i.e., the solution can also be merged into ggml-cuda.cu). What do you think given the above?

didzis avatar Feb 12 '24 13:02 didzis