Awni Hannun

Results 1014 comments of Awni Hannun

Something like `np.linalg.norm` for vectors and for a matrix Frobenius norm should be very easy to do.. that's also a good place to start just to get the packaging setup.

No I wouldn't deal with that using a transpose. You can usually call the routine with the right arguments and avoid a transpose. For example a row-major [M, N] matrix...

The CPU versions of these are pretty doable. See the QR factorization as an example https://github.com/ml-explore/mlx/blob/main/mlx/backend/common/qrf.cpp GPU support is more involved as I don’t think there are many open source...

We'd love to have this. Our first priority is quantization but when we have the bandwidth we can look into adding Flash attention. (Note PRs are welcome)

It should never segfault, so that's not something you are doing sub-optimally with `eval`. It looks like a bug to me.. but we'll have to investigate further.

So for some reason it segfaults if I don't put the mx.eval(T) at the beginning (e.g. before starting the loop). That definitely smells like a bug. But otherwise you don't...

Oops, sorry about that! It's very odd that doing `T_old = mx.broadcast_to(T,shape=T.shape)` is faster than `T_old=T` 🤔. More to investigate here.

I'm not able to reproduce the segfault. Does it segfault for you using the settings you have in the example above? Also what hardware / OS are you using?

I am able to reproduce the segfault on my machine. Current (and likely) hypothesis is that the segfault is a result of a stack overflow during the `eval` when we...

It looks like we have some work to for those shapes for our `conv2D`. I think your implementation has much more potential to be fast. If we can make the...