Awni Hannun

Results 1014 comments of Awni Hannun

There is an open PR to add batch support for mlx-lm: https://github.com/ml-explore/mlx-examples/pull/948 Will try and get it landed soon if possible.

I also added a chat command to MLX LM which is a good use case for the prompt cache re-use. The example is kind of fun to play with: ```...

> I am wondering what is ht point of the extra state in the KV cace? Is anybody using it now? Is there any reason it is set to the...

Yea I thought about a separate property.. and/or overriding `__getstate__` and `__setstate__`. The main downside I didn't like is that all the caches needed to implement it.. but maybe the...

> I added a small base class that implements the empty meta state and makes the load/save code a tad bit cleaner? Should I push it on top or we...

Ok, I tested prompt caching with a few different models / cache types and it seems to work well. I'm going to merge this. As a follow up we should...

Thanks for making so many MLX models, that's super awesome! > Or somehow run this on an external drive or something? That should already work if you specify the correct...

I think it makes sense to minimize the complexity to the `generate` function (which is becoming a bit spaghetti) to split out the batched generation into a separate function called...

@llllvvuu are you coming back to this?