bitsandbytes
bitsandbytes copied to clipboard
igemm function raise an error and get wrong result when the inner dim is small
System Info
CUDA version: 11.8 torch version: 2.0.0
Reproduction
from bitsandbytes.functional import igemm
def test_igemm():
inner_dim = 10
X = torch.randint(0,10, (1024, inner_dim), dtype=torch.int8).cuda()
W = torch.randint(0,10, (inner_dim, 1024), dtype=torch.int8).cuda()
X_out = igemm(X, W)
print(X_out)
test_igemm()
CUBLAS ERROR: Status 15 will appear in the terminal and the X_out will be a zero matrix (wrong result).
Expected behavior
When inner_dim is large(e.g. 100), the igemm works well.
I did the breakpoint test on it and noticed that the error is located at lib.cigemm() function in functional.py Line#1729.
inner_dim and output channel needs to be a multiple of 4, e.g. 4, 8, 12, 16, ...