sparseml [WiP] Fixing kv cache injection for LlaMa and Mistral

[WiP] Fixing kv cache injection for LlaMa and Mistral

Open dbogunowicz opened this issue 1 year ago • 3 comments

Apr 16 '24 13:04 dbogunowicz

@abhinavnmagic can I have reviews and testing?

Apr 22 '24 10:04 dbogunowicz

Does this PR fix ONNX export for quantized or just pruned or both? I will test accordingly.

Apr 22 '24 23:04 abhinavnmagic

@abhinavnmagic for all the llama models, both quant and non-quant

Apr 23 '24 11:04 dbogunowicz