optimum
optimum copied to clipboard
Enable past_key_values for ORTModelForCausalLM
In this PR we allow ORTModelForCausalLM class to take advantage of the pre-computed key and value past_key_values in order to speed up decoding, by setting use_cache to True.
Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
- [ ] Did you make sure to update the documentation with your changes?
- [ ] Did you write any new necessary tests?