sparseml icon indicating copy to clipboard operation
sparseml copied to clipboard

[Experimental][StarCode] KV Cache Injection

Open dbogunowicz opened this issue 1 year ago • 0 comments

Feature Description

The results of my experimentation with the tiny_starcoder model.

Findings:

  • the original KV cache is being added not as separate arrays: past_key_values.{attn_block_id}.values and past_key_values.{attn_block_id}.keys, but as a join array of keys and values. Did not get to look into breaking those two down, but by analyzing the onnx graph I do not see why we could not do it
  • the causal mask for this model has different dimensions than what we usually assume. This could be fixed by adding a node after the causal_mask input, that applies the appropriate permutation to the input to patch this.

This is an experimental branch, for which I will, for now, stop the development due to other priorities. To revisit in the future.

dbogunowicz avatar Feb 15 '24 13:02 dbogunowicz