Awni Hannun
Awni Hannun
There is an open PR to add batch support for mlx-lm: https://github.com/ml-explore/mlx-examples/pull/948 Will try and get it landed soon if possible.
Closes #2724
I also added a chat command to MLX LM which is a good use case for the prompt cache re-use. The example is kind of fun to play with: ```...
> I am wondering what is ht point of the extra state in the KV cace? Is anybody using it now? Is there any reason it is set to the...
Yea I thought about a separate property.. and/or overriding `__getstate__` and `__setstate__`. The main downside I didn't like is that all the caches needed to implement it.. but maybe the...
> I added a small base class that implements the empty meta state and makes the load/save code a tad bit cleaner? Should I push it on top or we...
Ok, I tested prompt caching with a few different models / cache types and it seems to work well. I'm going to merge this. As a follow up we should...
Thanks for making so many MLX models, that's super awesome! > Or somehow run this on an external drive or something? That should already work if you specify the correct...
I think it makes sense to minimize the complexity to the `generate` function (which is becoming a bit spaghetti) to split out the batched generation into a separate function called...
@llllvvuu are you coming back to this?