Dynamic cuda wrapper
This is a proof-of-concept, quick and dirty, CUDA version independent, dymamic CUDA library wrapper fix.
- [x]
cuda_setup/main.pycompatible with older libraries. (older libraries mean libs with cuda version tags) - [x] all used
cublas*,cusparse*,cublasLt*functions are wrapped usingdlsym()/GetProcessAddress() - [x] minimize cuda runtime functions that are linked with
cudart_staticlib.
ubuntu-20.04 + cuda11.8 environment.
$ ldd venv/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda.so
linux-vdso.so.1 (0x00007ffc44cc9000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f858d075000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f858d052000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f858d04c000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f858ce6a000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f858cd1b000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f858cd00000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f858cb0c000)
/lib64/ld-linux-x86-64.so.2 (0x00007f858f3fc000)
windows 10 + cuda11.8 environment (mingw64 terminal)
ldd bitsandbytes/libbitsandbytes_cuda.dll
ntdll.dll => /c/WINDOWS/SYSTEM32/ntdll.dll (0x7ffd23ed0000)
KERNEL32.DLL => /c/WINDOWS/System32/KERNEL32.DLL (0x7ffd23650000)
KERNELBASE.dll => /c/WINDOWS/System32/KERNELBASE.dll (0x7ffd217c0000)
msvcrt.dll => /c/WINDOWS/System32/msvcrt.dll (0x7ffd21f10000)
ucrtbase.dll => /c/WINDOWS/System32/ucrtbase.dll (0x7ffd21de0000)
VCRUNTIME140.dll => /c/WINDOWS/SYSTEM32/VCRUNTIME140.dll (0x7ffd0d890000)
MSVCP140.dll => /c/WINDOWS/SYSTEM32/MSVCP140.dll (0x7ffd0d8b0000)
ole32.dll => /c/WINDOWS/System32/ole32.dll (0x7ffd23710000)
RPCRT4.dll => /c/WINDOWS/System32/RPCRT4.dll (0x7ffd22730000)
combase.dll => /c/WINDOWS/System32/combase.dll (0x7ffd23a20000)
GDI32.dll => /c/WINDOWS/System32/GDI32.dll (0x7ffd22a20000)
win32u.dll => /c/WINDOWS/System32/win32u.dll (0x7ffd21ee0000)
gdi32full.dll => /c/WINDOWS/System32/gdi32full.dll (0x7ffd21c20000)
msvcp_win.dll => /c/WINDOWS/System32/msvcp_win.dll (0x7ffd21640000)
vcruntime140_1.dll => /c/Windows/System32/vcruntime140_1.dll (0x180000)
USER32.dll => /c/WINDOWS/System32/USER32.dll (0x7ffd23370000)
VCRUNTIME140_1.dll => /c/WINDOWS/SYSTEM32/VCRUNTIME140_1.dll (0x7ffd0d880000)
IMM32.DLL => /c/WINDOWS/System32/IMM32.DLL (0x7ffd22650000)
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.
Ah, this will definitely need to be rebased after #898 and #1041 get merged.
I'm a little worried that the API or ABI could have changed between 11 and 12, causing very weird runtime errors if we just use symbols from v12 and call them like they were v11, or vice versa. 🤔
Hey @wkpark @akx @matthewdouglas,
I'm hoping to merge this in the coming two weeks. One thing I feel we need to discuss is how we can test if this works as intended without introducing any issues:
Do you have any thoughts on that? I'm willing to do the work, but if you already have a few concrete things in mind, that would definitely help speed this up.
Thanks @wkpark on your initiative on this, really appreciated! Another thing needed would be to resolve the conflicts.
P.S. What's your opinion on how this goes together with #1052? I don't fully grok that yet. From what I understand #1052 determines major CUDA version. Is #1052 dependent on this one here?
I'm not sure how I feel about this one yet. I will think about it a little more but my initial thought is that it seems to be quite hacky feeling. Maybe this is a common thing in C++ world, but I'm not used to seeing it.
Edit: This would definitely conflict with what I was thinking with #1126 too.
I also get a hacky/too-complex feeling here, but I'm not sure what the better option would be.
Thanks for your valuable inputs ❤️ Really appreciated.