bitsandbytes icon indicating copy to clipboard operation
bitsandbytes copied to clipboard

Dynamic cuda wrapper

Open wkpark opened this issue 1 year ago • 8 comments

This is a proof-of-concept, quick and dirty, CUDA version independent, dymamic CUDA library wrapper fix.

  • [x] cuda_setup/main.py compatible with older libraries. (older libraries mean libs with cuda version tags)
  • [x] all used cublas*, cusparse*, cublasLt* functions are wrapped using dlsym()/GetProcessAddress()
  • [x] minimize cuda runtime functions that are linked with cudart_static lib.

ubuntu-20.04 + cuda11.8 environment.

$ ldd venv/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda.so
        linux-vdso.so.1 (0x00007ffc44cc9000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f858d075000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f858d052000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f858d04c000)
        libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f858ce6a000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f858cd1b000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f858cd00000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f858cb0c000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f858f3fc000)

windows 10 + cuda11.8 environment (mingw64 terminal)

ldd bitsandbytes/libbitsandbytes_cuda.dll
       	ntdll.dll => /c/WINDOWS/SYSTEM32/ntdll.dll (0x7ffd23ed0000)
       	KERNEL32.DLL => /c/WINDOWS/System32/KERNEL32.DLL (0x7ffd23650000)
       	KERNELBASE.dll => /c/WINDOWS/System32/KERNELBASE.dll (0x7ffd217c0000)
       	msvcrt.dll => /c/WINDOWS/System32/msvcrt.dll (0x7ffd21f10000)
       	ucrtbase.dll => /c/WINDOWS/System32/ucrtbase.dll (0x7ffd21de0000)
       	VCRUNTIME140.dll => /c/WINDOWS/SYSTEM32/VCRUNTIME140.dll (0x7ffd0d890000)
       	MSVCP140.dll => /c/WINDOWS/SYSTEM32/MSVCP140.dll (0x7ffd0d8b0000)
       	ole32.dll => /c/WINDOWS/System32/ole32.dll (0x7ffd23710000)
       	RPCRT4.dll => /c/WINDOWS/System32/RPCRT4.dll (0x7ffd22730000)
       	combase.dll => /c/WINDOWS/System32/combase.dll (0x7ffd23a20000)
       	GDI32.dll => /c/WINDOWS/System32/GDI32.dll (0x7ffd22a20000)
       	win32u.dll => /c/WINDOWS/System32/win32u.dll (0x7ffd21ee0000)
       	gdi32full.dll => /c/WINDOWS/System32/gdi32full.dll (0x7ffd21c20000)
       	msvcp_win.dll => /c/WINDOWS/System32/msvcp_win.dll (0x7ffd21640000)
       	vcruntime140_1.dll => /c/Windows/System32/vcruntime140_1.dll (0x180000)
       	USER32.dll => /c/WINDOWS/System32/USER32.dll (0x7ffd23370000)
       	VCRUNTIME140_1.dll => /c/WINDOWS/SYSTEM32/VCRUNTIME140_1.dll (0x7ffd0d880000)
       	IMM32.DLL => /c/WINDOWS/System32/IMM32.DLL (0x7ffd22650000)

wkpark avatar Feb 15 '24 13:02 wkpark

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

github-actions[bot] avatar Feb 15 '24 13:02 github-actions[bot]

Ah, this will definitely need to be rebased after #898 and #1041 get merged.

akx avatar Feb 15 '24 15:02 akx

I'm a little worried that the API or ABI could have changed between 11 and 12, causing very weird runtime errors if we just use symbols from v12 and call them like they were v11, or vice versa. 🤔

akx avatar Mar 04 '24 13:03 akx

Hey @wkpark @akx @matthewdouglas,

I'm hoping to merge this in the coming two weeks. One thing I feel we need to discuss is how we can test if this works as intended without introducing any issues:

Do you have any thoughts on that? I'm willing to do the work, but if you already have a few concrete things in mind, that would definitely help speed this up.

Thanks @wkpark on your initiative on this, really appreciated! Another thing needed would be to resolve the conflicts.

Titus-von-Koeller avatar Apr 08 '24 09:04 Titus-von-Koeller

P.S. What's your opinion on how this goes together with #1052? I don't fully grok that yet. From what I understand #1052 determines major CUDA version. Is #1052 dependent on this one here?

Titus-von-Koeller avatar Apr 08 '24 09:04 Titus-von-Koeller

I'm not sure how I feel about this one yet. I will think about it a little more but my initial thought is that it seems to be quite hacky feeling. Maybe this is a common thing in C++ world, but I'm not used to seeing it.

Edit: This would definitely conflict with what I was thinking with #1126 too.

matthewdouglas avatar Apr 08 '24 13:04 matthewdouglas

I also get a hacky/too-complex feeling here, but I'm not sure what the better option would be.

akx avatar Apr 09 '24 05:04 akx

Thanks for your valuable inputs ❤️ Really appreciated.

Titus-von-Koeller avatar Apr 09 '24 09:04 Titus-von-Koeller