mlx [BUG] Matmul gives wrong output for large sizes

Decrease 131072 by 131071 produces the right output, but above that the outputs don't match as they should.


import mlx.core as mx

w = mx.random.uniform(shape=(32, 32 * 4))

x = mx.random.uniform(shape=(131072, 128, 32))

y1 = x[:10] @ w
y2 = x @ w

print((y1 - y2[:10]).max().abs())

Apr 29 '24 13:04 awni

@jagrit06 this seems that we are overflowing an integer index into the output as it starts to break in the 2B range. INT_MAX is on the small side for the largest output we can support though.

Anything we can do to support larger sizes?

If not, we should put some throws in the ops as these are sneaky to debug.

Apr 29 '24 13:04 awni

This particular case is simple since what happens is when we try to compute auto batch_size_out = out.size() / (M *N);, the int M and N multiple to overflow and then the batch_size_out comes out to 0 The simple fix here is do that in size_t and I can make a couple other changes to make sure we can handle the large shapes

The only things I'm wondering about is if batch_size_out >= UINT32_MAX, then we will need to launch multiple matmul kernels since the grid dims can only be uint

Apr 30 '24 18:04 jagrit06

I tacked on a quick fix with #1058

Apr 30 '24 18:04 jagrit06

The only things I'm wondering about is if batch_size_out >= UINT32_MAX, then we will need to launch multiple matmul kernels since the grid dims can only be uint

That seems like a much more rare case.

Apr 30 '24 18:04 awni

thanks guys really cool work and fast fix!

May 09 '24 14:05 thegodone