transformers icon indicating copy to clipboard operation
transformers copied to clipboard

Guard tokenizers import in tokenization_auto

Open hvaara opened this issue 1 year ago • 4 comments

What does this PR do?

Before submitting

  • [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • [x] Did you read the contributor guideline, Pull Request section?
  • [ ] Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
  • [ ] Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
  • [ ] Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.

hvaara avatar Mar 21 '23 05:03 hvaara

The documentation is not available anymore as the PR was closed or merged.

Hi @hvaara, thanks for creating this PR!

Am I correct in saying this PR is to address an error when running: from transformers import AutoTokenizer if tokenizers isn't installed?

I can see there's a similar import of PreTrainedTokenizerFast in pipelines/__init__.py. Could you a safe import there as well? It seems it's just for types so can probably be placed in the if TYPE_CHECKING logic.

amyeroberts avatar Mar 22 '23 15:03 amyeroberts

Hi @amyeroberts!

Am I correct in saying this PR is to address an error when running: from transformers import AutoTokenizer if tokenizers isn't installed?

Yes, it will attempt to import tokenizers when it is not installed and raise an error.

I can see there's a similar import of PreTrainedTokenizerFast in pipelines/__init__.py. Could you a safe import there as well?

SG! I created this PR in WIP mode mainly to see if the tests would pass. Handling this in pipelines/__init__.py was on my TODO list assuming this PR passed tests.

It seems it's just for types so can probably be placed in the if TYPE_CHECKING logic.

Good feedback. I'll look into updating the logic to reflect this and also handle the case in pipelines/__init__.py.

Perhaps I should also create an additional (set of) test(s) to verify the test suite can be run when certain dependencies (like tokenizers) does not exist? Are there similar tests like this already? If so, can you please point me to them, and if not, how do you propose I create a test like that?

hvaara avatar Mar 24 '23 20:03 hvaara

@hvaara Great! Looking forward to having this added and safe imports.

For the test questions, I'll hand this over to our expert @ydshieh who will know :)

amyeroberts avatar Mar 24 '23 20:03 amyeroberts

Hi @hvaara

Thank you for the PR 🚀

Perhaps I should also create an additional (set of) test(s) to verify the test suite can be run when certain dependencies (like tokenizers) does not exist? Are there similar tests like this already? If so, can you please point me to them, and if not, how do you propose I create a test like that?

Regarding the testing, did you see the test suite failed to collect or run some tests when tokenizers is not installed? (rather than being skipped).

ydshieh avatar Mar 27 '23 14:03 ydshieh

Hi @hvaara Regarding

Perhaps I should also create an additional (set of) test(s) to verify the test suite can be run when certain dependencies (like tokenizers) does not exist? Are there similar tests like this already? If so, can you please point me to them, and if not, how do you propose I create a test like that?

I think we can keep the PR simple as it is.

On our CI, we make sure the required dependencies are installed. And if we see some errors caused by this issue, we will fix by installing the dependencies.

For community contributors, it's still much better for them to install the dependencies if they want/need to run the tests locally. Otherwise, the test results may be all green (i.e. pass), but (a lot) of the test methods are actually skipped. This may lead to a gap in the communication.

Thank you for the PR again!.

ydshieh avatar Mar 29 '23 06:03 ydshieh

Perhaps I should also create an additional (set of) test(s) to verify the test suite can be run when certain dependencies (like tokenizers) does not exist? Are there similar tests like this already? If so, can you please point me to them, and if not, how do you propose I create a test like that?

I think we can keep the PR simple as it is.

SGTM.

I'll update the commit one more time, but that should be the last one assuming the tests pass and it LGTY. Sorry for the spam.

hvaara avatar Mar 29 '23 07:03 hvaara

Thanks a lot for the help, and for merging the PR!

hvaara avatar Mar 30 '23 17:03 hvaara