mlx icon indicating copy to clipboard operation
mlx copied to clipboard

Adding linear algebra and other array operations

Open aarmey opened this issue 1 year ago • 13 comments

It looks like this is still missing many matrix operations like QR, SVD, einsum, etc. Is there a clear path to using these with or without MLX?

This has been a similar issue with the PyTorch MPS backend. While there is a long tail of these operations to support, they are essential to many machine learning models. As can be seen in the PyTorch issue, not including them limits the utility of packages like this.

aarmey avatar Dec 06 '23 14:12 aarmey

Huge +1 to this. Would be amazing to not have to drop back to numpy/CPU for these sorts of things.

Datamance avatar Dec 06 '23 18:12 Datamance

Hi! I am quite interested to work on this but not really sure how to start. Would someone be able to push me in the right direction?

I would be even open to have a short meeting if required.

I work from a M2 Max. Thank you :)

@awni

aymuos15 avatar Dec 08 '23 10:12 aymuos15

matrix factorizations aren't easy parallelizable on the gpu.

would QR and SVD only have cpu implementation for now? @awni

nullhook avatar Dec 10 '23 02:12 nullhook

We would love to have these operations available directly in MLX. It's not our top top priority but something we intend to add in the future or even better accept contributions for.

If you are interested in contributing, here are some thoughts:

  • To the extent that we can avoid writing these from scratch that is good.
  • For the CPU we can use LAPACK and/or Accelerate depending on what's available in each. A good starting point would be to wrap an op from one of those just for the CPU (and throw for the GPU).
  • On the GPU there are also some pre-written kernels we can use from MPS for example: (cholesky)[https://developer.apple.com/documentation/metalperformanceshaders/mpsmatrixdecompositioncholesky?language=objc]. You can see an example of how to wrap MPS matmul. The others could be done similarly.
  • For ops not supported by MPS, we'd need kernels which is a bigger project, but a fun one for those up for a challenge!

awni avatar Dec 14 '23 21:12 awni

Thoughts on wrapping these linalg specific functions to a separate module on Python frontend?

j-csc avatar Dec 14 '23 22:12 j-csc

So you can look at how mlx.core.random works. We could do something similar for mlx.core.linalg. Basically it's a nested namespace on the C++ side mlx::core::random and then we make it a submodule in the pybind11 bindings. Then you can do:

import mlx.core as mx
mx.linalg.< >

awni avatar Dec 14 '23 22:12 awni

Any thoughts on implementing at least vector/matrix norm methods such as torch.linalg.vector_norm?

gboduljak avatar Dec 15 '23 02:12 gboduljak

Something like np.linalg.norm for vectors and for a matrix Frobenius norm should be very easy to do.. that's also a good place to start just to get the packaging setup.

awni avatar Dec 15 '23 04:12 awni

note to self: almost all LAPACK routines are col-major

@awni would Transpose on an mlx array before sending it to LAPACK routines work here, or is there an alternative way?

nullhook avatar Dec 15 '23 22:12 nullhook

No I wouldn't deal with that using a transpose. You can usually call the routine with the right arguments and avoid a transpose. For example a row-major [M, N] matrix is the same as a col major [N, M] matrix in terms of its memory layout.

awni avatar Dec 16 '23 05:12 awni

Hi @awni, may I ask is there any learning resources of Apple Metal and Accelerate Framework? I want to contribute to LinAlg module but I do not know where to start with. For instance, if I want to build mx.linalg.eig , how can I use LAPACK from apple accelerate framework?

rickypang0219 avatar Dec 22 '23 17:12 rickypang0219

matrix factorizations aren't easy parallelizable on the gpu.

would QR and SVD only have cpu implementation for now? @awni

SVD support would be great.

ivanfioravanti avatar Feb 23 '24 23:02 ivanfioravanti

The CPU versions of these are pretty doable. See the QR factorization as an example https://github.com/ml-explore/mlx/blob/main/mlx/backend/common/qrf.cpp

GPU support is more involved as I don’t think there are many open source Metal implementations

awni avatar Feb 24 '24 04:02 awni