mark issues

Repositories
Issues
Comments

Results 2 issues of


                                            mark

[Feature Request] MLX_lm: Store KV cache of computed prompts to disk to avoid re-compute in follow-up runs

It would be great if MLX_lm supported a --cache_prompt flag like in llama.cpp's integration ([link to their discussion + eventual PR](https://github.com/ml-explore/mlx-examples/issues/new)). This would be a big benefit in reducing latency...

[Feature Request] MLX_lm.cache_prompt | Save cached_prompt as plaintext in the kv-cache-file metadata

Currently when you run MLX_lm.cache_prompt, the produced kv-cache-file contains the chat template, tokenizer config, model, and max_kv_size. It would be great if the actual text passed into it by the...

enhancement