optimum-habana
optimum-habana copied to clipboard
simplify the code for `cache_idx` calculation
What does this PR do?
Actually I think it’s better to handle cache_idx
in prepare_inputs_for_generation()
. But considering many models already implement --bucket_internal
, I just simplified the implementation and tried to make it easier to understand.
Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
- [ ] Did you make sure to update the documentation with your changes?
- [ ] Did you write any new necessary tests?