Matthew Douglas comments

Results 140 comments of


                                            Matthew Douglas

Tensor support for quantize_nf4

`blocksize` must be in `[64, 128, 256, 512, 1024, 2048, 4096]`. I am able to reproduce this with `blocksize=4096` but not any of the other options. I should have a...

A bit of confusion about the function kdequant_mm_int32_fp16().

Note the explanation in this comment: `// each block processes SUBTILE_ROWS*32 elements`. Do you see the results you expect when you look at the full output?

Remove separate NO_CUBLASLT build.

> Did Tim say why it's "probably not safe"? Do we know of an actual situation where cublaslt isn't available? Is such a situation something we want to support? I'm...

Feature Request: ROCm support (AMD GPU)

> @TimDettmers @Titus-von-Koeller , we are at ~95% parity for bnb for https://github.com/ROCm/bitsandbytes/tree/rocm_enabled on Instinct class gpus, and working to close the gaps on Navi. At this point, we should...

Initial kernel changes to support GaLore

Updated with changes added for 1-state optimizers (Momentum, RMSProp, Adagrad, Lion).

error when installing unsloth in a docker image

This looks like triton can't find the CUDA driver API library, `libcuda.so`. @abstrcode Can you share your full `Dockerfile`, or at least the image you're starting from? One thing comes...

This is not working on Google Colab

@gauravjoshi2034 What version of `bitsandbytes` are you trying this with? It looks to me like it might be `=0.40.0` then that should solve your issue.

FPE in quantize_blockwise

I can reproduce this behavior. We get a division by zero because `blocksize` is a 64-bit `long long` and overflows. Is there a practical reason or need for `blocksize` that...

bitsandbytes searching cudaso only in /usr/local and does not support other paths

I do agree - we would ideally want it to find the CUDA libraries even if they're in a non-standard path. Note that this command is just for diagnostics though,...

windows tests report

I was able to build with CUDA 12.0 and run the tests on Windows. **Hardware:** CPU: i7-12700H GPU: RTX 3060 Mobile **Software:** OS: Windows 11 MSVC: 19.38.33134 (VC++ Toolset 14.38.33130)...