Awni Hannun comments

Results 1014 comments of


                                            Awni Hannun

feat(mlx_lm): support batch input in `generate()`

No worries, just checking. I'll follow up in a week or so.

feat(mlx_lm): support batch input in `generate()`

Will get to this soon. Sorry for the delay.

feat(mlx_lm): support batch input in `generate()`

@qinxuye could you say a bit more about what you are looking for?

Poor Speculative Decoding Performance on M2 Ultra

I ran a couple benchmarks on M3 max and M2 Ultra. As expected we get much better scaling of the big model w.r.t. sequence length on M3 max than M2...

Poor Speculative Decoding Performance on M2 Ultra

On the optimistic side, from conversations @angeloskath and @barronalex there is likely room to improve small batch qmm which should help this use case considerably.

KeyError: 'llama.context_length'

Could you explain what you ran to get that error?

Failing when trying to run DeepSeek-R1-3bit on 3 Studios M2 Ultra with 128GB RAM each

Yes.. I added a fix for that in the most recent mlx-lm but it's not in PyPi. If you build the package from source it should work. I think 3x128...

[FEATURE REQUEST] Implementation of L-BFGS Optimizer in MLX

You might want to check out some of the optimizers in [mlx-optimizers](https://stockeh.github.io/mlx-optimizers/build/html/optimizers.html). There are several which approximate the hessian and may work as well as Adam + L-BFGS

python -m unittest discover python/tests returns various matmulu runtime warnings

This is a NumPy / downstream bug and unrelated to MLX. I think it is only on M4 with a recent enough OS and NumPy. More [here](https://github.com/numpy/numpy/issues/29820). And I think...

python -m unittest discover python/tests returns various matmulu runtime warnings

It might be a bug in accelerate..though MLX also uses accelerate and we don't see those spurious warnings, so I'm not sure why it's only from NumPy