Awni Hannun
Awni Hannun
@adhulipa are you planning to come back to this?
@adhulipa are you planning to return to this one?
Its more up to you. If you plan to work on it in the near future then you can keep it open (or start a new one if you prefer)....
I think you can just do something like this: ```python U, S, V = mx.linalg.svd(A) K = min(A.shape[0], A.shape[1]) Atilde = (U[:, :K] * S) @ V[:K, :] ``` We...
@jagrit06 this seems that we are overflowing an integer index into the output as it starts to break in the 2B range. INT_MAX is on the small side for the...
> The only things I'm wondering about is if batch_size_out >= UINT32_MAX, then we will need to launch multiple matmul kernels since the grid dims can only be uint That...
So I think you're goal is to export `weights`, `scales`, and `biases` to GGUF without dequantizing so you can load them natively in llama.cpp right? It is not quite so...
I will leave this open as an enhancement to help prioritize when we can get to it. For the very short-term your best bet is exporting to fp16 (either safetensors...
This is expected in as much as we don't have implementations of these for `int64`. Are you able to use a 32 bit type as a workaround?
It's pretty unlikely we will implement this in the near future because the output shape depends on the input data. MLX is currently not setup well to deal with operations...