Nathan Tam

London, United Kingdom A Data scientist and an AI enthusiast.

Results 3 comments of


                                            Nathan Tam

feat(mlx_lm): support batch input in `generate()`

Just realised the attention mask has been mentioned in this PR, which is the reason I raised this issue #1044

Enable caching for 'generate' and 'stream_generate' functions to ensure persistence of cache across multiple requests

Just updating the title of the PR for clarity. Now KV cache of any generation can be reused for other requests with these changes.

Enable caching for 'generate' and 'stream_generate' functions to ensure persistence of cache across multiple requests

The code in server.py is modified accordingly to adapt to the changes made with `generate_step`. Prompt caching is available on server.py by default.