Thomas Liao
Thomas Liao
Sounds good, I'd say maybe it can be enabled via a command flag... What are your thoughts?
Finally getting around to trying some stuff with this.
Ah yeah that might depend on the model you're using, phi seems to have issues, perhaps try Mistral large?
Hi @heislera763, great find, I've always wondered about that asymmetrical memory usage being KV cache, but never knew about the -nkvo flag. It seems like the best default configuration would...
Hmm, I've been thinking of doing a rewrite for the whole generation system so i'd imagine that might be the only real way to properly address this - will add...
Thank you! Would you mind making that into a PR? I'd be happy to merge it!