Ella Charlaix
Ella Charlaix
In this PR we allow `ORTModelForCausalLM` class to take advantage of the pre-computed key and value `past_key_values` in order to speed up decoding, by setting `use_cache` to `True`. ## Before...
Refactorization of ORTOptimizer
Need https://github.com/huggingface/transformers/pull/28141 to be merged and part of the release before we can merge
Enable loading of ONNX models + ONNX Runtime inference using Optimum Some updates might follow from https://github.com/huggingface/moon-landing/pull/7320 (WIP)
docs now deleted automatically after 30 days https://github.com/huggingface/doc-builder/blob/main/.github/workflows/delete_old_pr_documentations.yml As done in optimum : https://github.com/huggingface/optimum/pull/1565 cc @regisss
Needs https://github.com/huggingface/optimum/pull/1832 to be merged