Update model to use Key Value Caching similar to Pytorch

Open vade opened this issue 2 years ago • 1 comments

See this comment from Matthijs the CoreML ninja :D

https://twitter.com/mhollemans/status/1618639882402025473?s=46&t=bOTr_fvEuCVEiBiGAGuSKw

Im guessing steps are to:

identify which layers need additional input and outputs from the CoreML model to expose the tensors
update the exporting logic
figure out the external logic to properly use the cache and update the tensors as expected.

Jan 26 '23 16:01 vade

see : https://scale.com/blog/pytorch-improvements#Text%20Translation

and

https://github.com/openai/whisper/blob/f82bc59f5ea234d4b97fb2860842ed38519f7e65/whisper/decoding.py#L134

Jan 26 '23 18:01 vade