fix: implement lazy imports for BetterTransformer
Related Issue
Fixes import errors when using --no-bettertransformer flag with transformers >= 4.49.
Summary
This PR implements lazy imports for BetterTransformer to prevent import errors when it's disabled or when using incompatible transformers versions. BetterTransformer is deprecated in optimum and requires transformers < 4.49.
Changes:
- Replaced eager imports with lazy loading pattern
- Added
_import_bettertransformer()function for on-demand imports - Wrapped imports in try/except to handle version conflicts gracefully
- Returns early when
--no-bettertransformeris used, avoiding unnecessary imports
Why this is needed:
- BetterTransformer is deprecated and incompatible with transformers >= 4.49
- Users with modern transformers versions get import errors even with
--no-bettertransformer - This allows the flag to work as intended
Checklist
- [x] I have read the CONTRIBUTING guidelines.
- [ ] I have added tests to cover my changes.
- [ ] I have updated the documentation (docs folder) accordingly.
Additional Notes
This change ensures backward compatibility for users who still want BetterTransformer while fixing the issue for users with newer dependencies.
:warning: Please install the to ensure uploads and comments are reliably processed by Codecov.
Codecov Report
:x: Patch coverage is 72.22222% with 5 lines in your changes missing coverage. Please review.
:white_check_mark: Project coverage is 79.54%. Comparing base (f84d8e2) to head (947044d).
| Files with missing lines | Patch % | Lines |
|---|---|---|
| ...inity_emb/infinity_emb/transformer/acceleration.py | 72.22% | 5 Missing :warning: |
| :exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality. |
Additional details and impacted files
@@ Coverage Diff @@
## main #619 +/- ##
=======================================
Coverage 79.54% 79.54%
=======================================
Files 43 43
Lines 3486 3501 +15
=======================================
+ Hits 2773 2785 +12
- Misses 713 716 +3
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
Absolutely. According to this issue:
please use transformers' attention implementation: https://huggingface.co/docs/transformers/main/en/llm_optims#attention and torch.compile (with static cache if decoder): https://huggingface.co/docs/transformers/main/en/llm_optims#static-kv-cache-and-torchcompile for the best possible performance (exceeding bettertransformer, which no one maintains! 💀).
What are your thoughts on this method vs continue support for current bettertransformer?
@michaelfeil One option to keep bettertransformer as was is #641. Extracted bettertransformer code in its own package. This should get rid of version error while keeping the orig bettertransformer code. Not sure if/how long this method can work.