Awni Hannun comments

Results 1014 comments of


                                            Awni Hannun

mlx_lm.batch

There is an open PR to add batch support for mlx-lm: https://github.com/ml-explore/mlx-examples/pull/948 Will try and get it landed soon if possible.

[CUDA] Partly fix random for large sizes

Closes #2724

More cache improvements

I also added a chat command to MLX LM which is a good use case for the prompt cache re-use. The example is kind of fun to play with: ```...

More cache improvements

> I am wondering what is ht point of the extra state in the KV cace? Is anybody using it now? Is there any reason it is set to the...

More cache improvements

Yea I thought about a separate property.. and/or overriding `__getstate__` and `__setstate__`. The main downside I didn't like is that all the caches needed to implement it.. but maybe the...

More cache improvements

> I added a small base class that implements the empty meta state and makes the load/save code a tad bit cleaner? Should I push it on top or we...

More cache improvements

Ok, I tested prompt caching with a few different models / cache types and it seems to work well. I'm going to merge this. As a follow up we should...

Possible to reduce disc writes when converting models?

Thanks for making so many MLX models, that's super awesome! > Or somehow run this on an external drive or something? That should already work if you specify the correct...

feat(mlx_lm): support batch input in `generate()`

I think it makes sense to minimize the complexity to the `generate` function (which is becoming a bit spaghetti) to split out the batched generation into a separate function called...

feat(mlx_lm): support batch input in `generate()`

@llllvvuu are you coming back to this?