mark

Results 2 issues of mark

It would be great if MLX_lm supported a --cache_prompt flag like in llama.cpp's integration ([link to their discussion + eventual PR](https://github.com/ml-explore/mlx-examples/issues/new)). This would be a big benefit in reducing latency...

Currently when you run MLX_lm.cache_prompt, the produced kv-cache-file contains the chat template, tokenizer config, model, and max_kv_size. It would be great if the actual text passed into it by the...

enhancement