Awni Hannun
Awni Hannun
Title is pretty self-explanatory. Might also be worth lumping in `isfinite`, `isposinf`, `isneginf`.
Add support for Metal backend with FFT primitive as mentioned here https://github.com/ml-explore/mlx-examples/issues/249
- [x] Sort dataset prior to batching for more consistent lengths - [x] Compile non MOE models - [x] Add checkpointing as an option `--grad-checkpoint` ## Compile Benchmarks Decent gain...
E.g something like below so saved models can be used in Transformers library: ``` metadata={"format": "pt"}) ``` See original issue in MLX: https://github.com/ml-explore/mlx/issues/743#issuecomment-1965427589
- Use SDPA for Mixtral - Use no repeat for phi - Store keys in fp16 for phi, phixtral Also closes #526
Decrease `131072` by `131071` produces the right output, but above that the outputs don't match as they should. ```python import mlx.core as mx w = mx.random.uniform(shape=(32, 32 * 4)) x...
It's relatively simple to implement, but maybe worth adding since it's also quite common. Here's a possible implementation: ```python import mlx.core as mx from mlx.utils import tree_map def clip_grad_norm(grads, max_norm):...
Add for the CPU using Lapack. For the GPU [MPS has a Cholesky](https://developer.apple.com/documentation/metalperformanceshaders/mpsmatrixdecompositioncholesky?language=objc) which could be a good option to start with (following how we used to do bind MPS...
Add a memory limit for command buffer packing and increase the default limit for the number of ops. Also added a device category property: - small is everything < Max...