sparseml icon indicating copy to clipboard operation
sparseml copied to clipboard

[WiP] Fixing kv cache injection for LlaMa and Mistral

Open dbogunowicz opened this issue 1 year ago • 3 comments

dbogunowicz avatar Apr 16 '24 13:04 dbogunowicz

@abhinavnmagic can I have reviews and testing?

dbogunowicz avatar Apr 22 '24 10:04 dbogunowicz

Does this PR fix ONNX export for quantized or just pruned or both? I will test accordingly.

abhinavnmagic avatar Apr 22 '24 23:04 abhinavnmagic

@abhinavnmagic for all the llama models, both quant and non-quant

dbogunowicz avatar Apr 23 '24 11:04 dbogunowicz