mlx-swift-examples
mlx-swift-examples copied to clipboard
Support for Batch Generation
I found the PR by Awni cool in mlx-lm about batch generation and was experimenting with it over the weekend. I was able to implement it with almost same benchmark numbers on my M5 MacBook Pro with Llama 3.2 3B 4-bit:
| Batch Size | MLX LM (t/s) | MLX Swift (t/s) |
|---|---|---|
| 1 | 61 | 62 |
| 2 | 122 | 118 |
| 32 | 349 | 344 |
There are subtle improvements that I have not been able to find but I think a review would help me out.
Creating this issue in-case if somebody else is already working on it. If not, I can clean up my branch and send a PR!