bitsandbytes icon indicating copy to clipboard operation
bitsandbytes copied to clipboard

igemm function raise an error and get wrong result when the inner dim is small

Open Little0o0 opened this issue 1 year ago • 1 comments

System Info

CUDA version: 11.8 torch version: 2.0.0

Reproduction

from bitsandbytes.functional import igemm

def test_igemm():
    inner_dim = 10
    X = torch.randint(0,10, (1024, inner_dim), dtype=torch.int8).cuda()
    W = torch.randint(0,10, (inner_dim, 1024), dtype=torch.int8).cuda()
    X_out = igemm(X, W)
    print(X_out)

test_igemm()

CUBLAS ERROR: Status 15 will appear in the terminal and the X_out will be a zero matrix (wrong result).

Expected behavior

When inner_dim is large(e.g. 100), the igemm works well. I did the breakpoint test on it and noticed that the error is located at lib.cigemm() function in functional.py Line#1729.

Little0o0 avatar Jan 16 '24 08:01 Little0o0

inner_dim and output channel needs to be a multiple of 4, e.g. 4, 8, 12, 16, ...

Little0o0 avatar Apr 06 '24 11:04 Little0o0