bitsandbytes icon indicating copy to clipboard operation
bitsandbytes copied to clipboard

Remove separate NO_CUBLASLT build.

Open matthewdouglas opened this issue 1 year ago • 6 comments

This PR removes the build option NO_CUBLASLT. It additionally removes the runtime check to load the separate nocublaslt variants of the library.

Reasoning:

  • Having separate library builds adds complexity and extra build time
  • Since CUDA 11, libcublas actually takes a dependency on libcublasLt already
  • We have runtime checks against compute capability to avoid calling library functions that would be unsupported

So far I've only tested this on RTX 3060. I do have access to a machine with a GTX 1660, so I'll try to test on that too.

matthewdouglas avatar Mar 04 '24 20:03 matthewdouglas

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

github-actions[bot] avatar Mar 04 '24 20:03 github-actions[bot]

If you want, I have a GTX 1070 (under WSL2, works surprisingly well) I can test on:

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.36                 Driver Version: 546.33       CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce GTX 1070        On  | 00000000:2D:00.0  On |                  N/A |
|  0%   62C    P0              37W / 185W |   1204MiB /  8192MiB |      1%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

akx avatar Mar 05 '24 15:03 akx

Hey @matthewdouglas @akx,

Tim mentioned that it's probably not safe to remove this.

What's your opinion on this? How can be certain what's what? Currently, I'm not sure how to best proceed.

Titus-von-Koeller avatar Apr 09 '24 09:04 Titus-von-Koeller

Did Tim say why it's "probably not safe"? Do we know of an actual situation where cublaslt isn't available? Is such a situation something we want to support?

akx avatar Apr 09 '24 10:04 akx

Did Tim say why it's "probably not safe"? Do we know of an actual situation where cublaslt isn't available? Is such a situation something we want to support?

I'm curious too, but I think there might also be just a naming issue here since the cublasLt has shipped with the CUDA Toolkit since v10.1. It could have been placed in some unusual spots but by the time toolkit 11.0 comes around it's not an issue and we should always be able to link to it. PyTorch binaries ship with it. And if I'm not mistaken, libcublas.so itself depends on libcublaslt.so these days.

The main differentiator here is support for int8 tensor cores (e.g. the check for compute capability >= 7.5). So we would have to make sure to not call F.igemmlt() for such devices. But linking in the libcublaslt code IMO shouldn't be a problem if we're not trying to run the unsupported matmul ops. And there's already a path in MatMul8bitLt for that. We can guard some device code with __CUDA_ARCH__ too.

Some places where cublasLt is used:

  • int igemmlt<int, int, int>(cublasLtHandle_t ltHandle, int m, int n, int k, ...)
  • cublasLtOrder_t get_order<int>()
  • void transform<T, SRC, TARGET, transpose, DTYPE>(cublasLtHandle_t, T *A, T *out, int dim1, int dim2)

Separately, I believe I remember reading somewhere that there would be intent to actually deprecate the int8 matmul path that does not use tensor cores too (F.igemm, MatMul8bit, and also F.vectorwise_quant, F.vectorwise_mm_dequant).

matthewdouglas avatar Apr 09 '24 13:04 matthewdouglas

I'll try to get in touch with Tim to get more info from him and relay the new info you provided. Unfortunately, he didn't give any reasoning at the time.

He's quite unavailable atm, so it might take a few days.

Thanks @matthewdouglas for this thorough and knowledgable analysis, this was once again very helpful!

Titus-von-Koeller avatar Apr 09 '24 17:04 Titus-von-Koeller