transformers
transformers copied to clipboard
Guard tokenizers import in tokenization_auto
What does this PR do?
Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
- [x] Did you read the contributor guideline, Pull Request section?
- [ ] Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
- [ ] Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
- [ ] Did you write any new necessary tests?
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.
The documentation is not available anymore as the PR was closed or merged.
Hi @hvaara, thanks for creating this PR!
Am I correct in saying this PR is to address an error when running: from transformers import AutoTokenizer
if tokenizers
isn't installed?
I can see there's a similar import of PreTrainedTokenizerFast
in pipelines/__init__.py
. Could you a safe import there as well? It seems it's just for types so can probably be placed in the if TYPE_CHECKING
logic.
Hi @amyeroberts!
Am I correct in saying this PR is to address an error when running: from transformers import AutoTokenizer if tokenizers isn't installed?
Yes, it will attempt to import tokenizers
when it is not installed and raise an error.
I can see there's a similar import of PreTrainedTokenizerFast in
pipelines/__init__.py
. Could you a safe import there as well?
SG! I created this PR in WIP mode mainly to see if the tests would pass. Handling this in pipelines/__init__.py
was on my TODO list assuming this PR passed tests.
It seems it's just for types so can probably be placed in the if TYPE_CHECKING logic.
Good feedback. I'll look into updating the logic to reflect this and also handle the case in pipelines/__init__.py
.
Perhaps I should also create an additional (set of) test(s) to verify the test suite can be run when certain dependencies (like tokenizers
) does not exist? Are there similar tests like this already? If so, can you please point me to them, and if not, how do you propose I create a test like that?
@hvaara Great! Looking forward to having this added and safe imports.
For the test questions, I'll hand this over to our expert @ydshieh who will know :)
Hi @hvaara
Thank you for the PR 🚀
Perhaps I should also create an additional (set of) test(s) to verify the test suite can be run when certain dependencies (like tokenizers) does not exist? Are there similar tests like this already? If so, can you please point me to them, and if not, how do you propose I create a test like that?
Regarding the testing, did you see the test suite failed to collect or run some tests when tokenizers
is not installed? (rather than being skipped).
Hi @hvaara Regarding
Perhaps I should also create an additional (set of) test(s) to verify the test suite can be run when certain dependencies (like tokenizers) does not exist? Are there similar tests like this already? If so, can you please point me to them, and if not, how do you propose I create a test like that?
I think we can keep the PR simple as it is.
On our CI, we make sure the required dependencies are installed. And if we see some errors caused by this issue, we will fix by installing the dependencies.
For community contributors, it's still much better for them to install the dependencies if they want/need to run the tests locally. Otherwise, the test results may be all green (i.e. pass), but (a lot) of the test methods are actually skipped. This may lead to a gap in the communication.
Thank you for the PR again!.
Perhaps I should also create an additional (set of) test(s) to verify the test suite can be run when certain dependencies (like tokenizers) does not exist? Are there similar tests like this already? If so, can you please point me to them, and if not, how do you propose I create a test like that?
I think we can keep the PR simple as it is.
SGTM.
I'll update the commit one more time, but that should be the last one assuming the tests pass and it LGTY. Sorry for the spam.
Thanks a lot for the help, and for merging the PR!