igemm function raise an error and get wrong result when the inner dim is small

Open Little0o0 opened this issue 1 year ago • 1 comments

System Info

CUDA version: 11.8 torch version: 2.0.0

Reproduction

from bitsandbytes.functional import igemm

def test_igemm():
    inner_dim = 10
    X = torch.randint(0,10, (1024, inner_dim), dtype=torch.int8).cuda()
    W = torch.randint(0,10, (inner_dim, 1024), dtype=torch.int8).cuda()
    X_out = igemm(X, W)
    print(X_out)

test_igemm()

CUBLAS ERROR: Status 15 will appear in the terminal and the X_out will be a zero matrix (wrong result).

Expected behavior

When inner_dim is large(e.g. 100), the igemm works well. I did the breakpoint test on it and noticed that the error is located at lib.cigemm() function in functional.py Line#1729.

Jan 16 '24 08:01 Little0o0

inner_dim and output channel needs to be a multiple of 4, e.g. 4, 8, 12, 16, ...

Apr 06 '24 11:04 Little0o0