mark
Results
2
issues of
mark
It would be great if MLX_lm supported a --cache_prompt flag like in llama.cpp's integration ([link to their discussion + eventual PR](https://github.com/ml-explore/mlx-examples/issues/new)). This would be a big benefit in reducing latency...
Currently when you run MLX_lm.cache_prompt, the produced kv-cache-file contains the chat template, tokenizer config, model, and max_kv_size. It would be great if the actual text passed into it by the...
enhancement