Matthew Douglas
Matthew Douglas
`blocksize` must be in `[64, 128, 256, 512, 1024, 2048, 4096]`. I am able to reproduce this with `blocksize=4096` but not any of the other options. I should have a...
Note the explanation in this comment: `// each block processes SUBTILE_ROWS*32 elements`. Do you see the results you expect when you look at the full output?
> Did Tim say why it's "probably not safe"? Do we know of an actual situation where cublaslt isn't available? Is such a situation something we want to support? I'm...
> @TimDettmers @Titus-von-Koeller , we are at ~95% parity for bnb for https://github.com/ROCm/bitsandbytes/tree/rocm_enabled on Instinct class gpus, and working to close the gaps on Navi. At this point, we should...
Updated with changes added for 1-state optimizers (Momentum, RMSProp, Adagrad, Lion).
This looks like triton can't find the CUDA driver API library, `libcuda.so`. @abstrcode Can you share your full `Dockerfile`, or at least the image you're starting from? One thing comes...
@gauravjoshi2034 What version of `bitsandbytes` are you trying this with? It looks to me like it might be `=0.40.0` then that should solve your issue.
I can reproduce this behavior. We get a division by zero because `blocksize` is a 64-bit `long long` and overflows. Is there a practical reason or need for `blocksize` that...
I do agree - we would ideally want it to find the CUDA libraries even if they're in a non-standard path. Note that this command is just for diagnostics though,...
I was able to build with CUDA 12.0 and run the tests on Windows. **Hardware:** CPU: i7-12700H GPU: RTX 3060 Mobile **Software:** OS: Windows 11 MSVC: 19.38.33134 (VC++ Toolset 14.38.33130)...